PyPI page
Home page
Author:
None
License:
MIT
Summary:
Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, JSON, CSV, HTML) with intelligent content extraction and advanced OCR.
Latest version:
1.1.8
Required dependencies:
beautifulsoup4
|
docling-ibm-models
|
easyocr
|
flask
|
huggingface_hub
|
lxml
|
markdownify
|
mcp
|
numpy
|
openpyxl
|
pandas
|
pdf2image
|
pillow
|
pypandoc
|
python-docx
|
python-pptx
|
requests
|
setuptools
|
tiktoken
|
tokenizers
|
tqdm
|
transformers
|
wheel
Optional dependencies:
black
|
flake8
|
flask
|
mypy
|
ollama
|
pytest
|
pytest-cov
Downloads last day:
48
Downloads last week:
247
Downloads last month:
1,719