PyPI page
Home page
Author:
Alberto-Codes
License:
Apache-2.0
Summary:
TurboQuant KV cache compression for vLLM — fused Triton kernels, 3.76x compression, 3.7x faster decode on RTX 4090
Latest version:
1.5.0
Required dependencies:
accelerate
|
einops
|
molmo-utils
|
scipy
|
torch
|
torchvision
|
transformers
|
vllm
Optional dependencies:
bitsandbytes
|
mkdocs
|
mkdocs-gen-files
|
mkdocs-literate-nav
|
mkdocs-material
|
mkdocs-section-index
|
mkdocstrings
|
vllm
Downloads last day:
63
Downloads last week:
923
Downloads last month:
5,518