PyPI page
Home page
Author:
The MInference team
License:
MIT License
Summary:
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
Latest version:
0.1.6.0
Required dependencies:
einops
|
torch
|
transformers
|
triton
Optional dependencies:
black
|
einops
|
flake8
|
isort
|
pre-commit
|
pytest
|
pytest-xdist
|
torch
|
transformers
|
triton
Downloads last day:
37
Downloads last week:
117
Downloads last month:
330