PyPI page
Home page
Author:
Joao Marques
License:
Apache-2.0
Summary:
Training-free KV cache compression via E8 lattice quantization and attention-aware token eviction
Latest version:
0.4.0
Required dependencies:
numpy
|
torch
|
zstandard
Optional dependencies:
accelerate
|
transformers
Downloads last day:
2
Downloads last week:
11
Downloads last month:
84