PyPI page
Home page
Author:
Nameet Potnis
Summary:
Self-healing PDF extraction for RAG. Per-page confidence scoring, auto re-extracts bad pages, MCP server, LangChain/LlamaIndex loaders. LlamaParse alternative, #2 on opendataloader-bench.
Latest version:
1.6.4
Required dependencies:
pymupdf
|
pymupdf4llm
|
python-bidi
|
rich
|
typer
Optional dependencies:
anthropic
|
docling
|
google-genai
|
langchain-core
|
llama-index-core
|
marker-pdf
|
mcp
|
mistralai
|
ollama
|
onnxruntime
|
openai
|
opendataloader-pdf
|
pytest
|
pytest-asyncio
|
rapidocr
|
ruff
|
surya-ocr
|
uvicorn
|
watchdog
Downloads last day:
41
Downloads last week:
555
Downloads last month:
1,183