PyPI page
Home page
Author:
None
License:
MIT
Summary:
Ingestion (web/PDF/DOCX/TXT), cleaning, paragraph-level LID (PT/EN/ES), and spaCy-based normalization; PDF export.
Latest version:
0.2.7
Required dependencies:
clean-text
|
fasttext
|
fasttext-wheel
|
ftfy
|
justext
|
lxml
|
numpy
|
pdfminer.six
|
python-docx
|
reportlab
|
requests
|
spacy
|
thinc
|
trafilatura
|
unidecode
Optional dependencies:
pycld3
Downloads last day:
6
Downloads last week:
22
Downloads last month:
27