PyPI page
Home page
Author:
None
Summary:
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Latest version:
5.4.2
Required dependencies:
accelerate
|
datasets
|
device-smi
|
dill
|
hf_transfer
|
huggingface_hub
|
logbar
|
maturin
|
numpy
|
packaging
|
pillow
|
protobuf
|
pyarrow
|
pypcre
|
random_word
|
safetensors
|
threadpoolctl
|
tokenicer
|
torch
|
torchao
|
transformers
Optional dependencies:
bitblas
|
evalplus
|
fastapi
|
flashinfer-python
|
lm_eval
|
mlx_lm
|
optimum
|
parameterized
|
pydantic
|
pytest
|
pytest-timeout
|
ruff
|
sglang
|
triton
|
uvicorn
|
vllm
Downloads last day:
722
Downloads last week:
3,934
Downloads last month:
15,882