PyPI page
Home page
Author:
None
Summary:
Run 70B+ LLMs on a single 4GB GPU — no quantization required. Layer-streaming inference for consumer hardware.
Latest version:
1.1.0
Required dependencies:
accelerate
|
bitsandbytes
|
einops
|
huggingface-hub
|
safetensors
|
scipy
|
sentencepiece
|
tiktoken
|
torch
|
tqdm
|
transformers
|
transformers-stream-generator
Optional dependencies:
bitsandbytes
|
flash-attn
|
kvikio-cu12
Downloads last day:
2
Downloads last week:
135
Downloads last month:
234