PyPI page
Home page
Author:
David Van de Ven
License:
GPL3
Summary:
Automatically caption images using various LLaVA multimodal models. This tool processes images with state-of-the-art vision language models to generate accurate, high-quality captions.
Latest version:
0.8.0
Required dependencies:
httpx
|
huggingface-hub
|
json-repair
|
llama-cpp-python
|
mlx
|
mlx-vlm
|
ollama
|
pandas
|
pillow
|
requests
|
torch
|
tqdm
|
transformers
Downloads last day:
36
Downloads last week:
457
Downloads last month:
501