minference

PyPI page
Home page
Author: The MInference team
License: MIT License
Summary: To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
Latest version: 0.1.6.0
Required dependencies: einops | torch | transformers | triton
Optional dependencies: black | einops | flake8 | isort | pre-commit | pytest | pytest-xdist | torch | transformers | triton

Downloads last day: 37
Downloads last week: 117
Downloads last month: 330