PyPI page
Home page
Author:
None
License:
TextWizard — Copyright (C) 2024–2025 Mattia Rubino
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public ...
Summary:
Extract, clean, and analyze text from PDFs, Office docs, images, CSV/HTML. Local OCR (Tesseract), Azure DI, NER (spaCy/Stanza), language detection, spell-check, statistics, and HTML tools.
Latest version:
1.1.0
Required dependencies:
lxml
|
marisa-trie
|
openpyxl
|
pandas
|
platformdirs
|
pyahocorasick
|
pymupdf
|
pytesseract
|
regex
|
xlrd
|
zstandard
Optional dependencies:
azure-ai-documentintelligence
|
azure-core
|
spacy
|
spacy-stanza
|
stanza
Downloads last day:
0
Downloads last week:
13
Downloads last month:
43