PyPI page
Home page
Author:
Unstructured Technologies
License:
Apache-2.0
Summary:
A library that prepares raw documents for downstream ML tasks.
Latest version:
0.16.11
Required dependencies:
backoff
|
beautifulsoup4
|
chardet
|
dataclasses-json
|
emoji
|
filetype
|
html5lib
|
langdetect
|
lxml
|
nltk
|
numpy
|
psutil
|
python-iso639
|
python-magic
|
python-oxmsg
|
rapidfuzz
|
requests
|
tqdm
|
typing-extensions
|
unstructured-client
|
wrapt
Optional dependencies:
effdet
|
google-cloud-vision
|
langdetect
|
markdown
|
networkx
|
onnx
|
openpyxl
|
paddlepaddle
|
pandas
|
pdf2image
|
pdfminer.six
|
pi-heif
|
pikepdf
|
pypandoc
|
pypdf
|
python-docx
|
python-pptx
|
sacremoses
|
sentencepiece
|
torch
|
transformers
|
unstructured-inference
|
unstructured.paddleocr
|
unstructured.pytesseract
|
xlrd
Downloads last day:
50,401
Downloads last week:
463,975
Downloads last month:
2,381,815