PyPI page
Home page
Author:
None
Summary:
A modular pipeline for preprocessing scientific documents (PDF, DOCX, TEX, XML, TXT)
Latest version:
0.1.2
Required dependencies:
nameparser
|
unidecode
Optional dependencies:
black
|
faiss-cpu
|
lxml
|
mypy
|
nltk
|
opencv-python
|
pymupdf
|
pysbd
|
pytesseract
|
pytest
|
pytest-cov
|
python-docx
|
ruff
|
scikit-learn
|
scispacy
|
sentence-transformers
|
spacy
Downloads last day:
4
Downloads last week:
9
Downloads last month:
12