PyPI page
Home page
Author:
None
Summary:
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Latest version:
7.0.0
Required dependencies:
accelerate
|
datasets
|
defuser
|
device-smi
|
dill
|
jinja2
|
logbar
|
maturin
|
ninja
|
numpy
|
packaging
|
pillow
|
protobuf
|
pyarrow
|
pypcre
|
safetensors
|
threadpoolctl
|
tokenicer
|
torch
|
torchao
|
transformers
Optional dependencies:
bitblas
|
bitsandbytes
|
evalution
|
fastapi
|
flashinfer-python
|
mlx_lm
|
nvidia-cublas
|
nvidia-cublas-cu12
|
nvidia-cuda-runtime
|
nvidia-cuda-runtime-cu12
|
nvidia-cusolver
|
nvidia-cusolver-cu12
|
nvidia-cusparse
|
nvidia-cusparse-cu12
|
optimum
|
parameterized
|
pydantic
|
pytest
|
pytest-timeout
|
ruff
|
sglang
|
triton
|
uvicorn
|
vllm
Downloads last day:
779
Downloads last week:
7,088
Downloads last month:
34,935