PyPI page
Home page
Author:
Gavin Li
Summary:
AirLLM allows single 4GB GPU card to run 70B large language models without quantization, distillation or pruning. 8GB vmem to run 405B Llama3.1.
Latest version:
2.11.0
Required dependencies:
accelerate
|
huggingface-hub
|
optimum
|
safetensors
|
scipy
|
torch
|
tqdm
|
transformers
Downloads last day:
453
Downloads last week:
3,185
Downloads last month:
16,447