textwizard

PyPI page
Home page
Author: None
License: TextWizard — Copyright (C) 2024–2025 Mattia Rubino This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public ...
Summary: Extract, clean, and analyze text from PDFs, Office docs, images, CSV/HTML. Local OCR (Tesseract), Azure DI, NER (spaCy/Stanza), language detection, spell-check, statistics, and HTML tools.
Latest version: 1.1.0
Required dependencies: lxml | marisa-trie | openpyxl | pandas | platformdirs | pyahocorasick | pymupdf | pytesseract | regex | xlrd | zstandard
Optional dependencies: azure-ai-documentintelligence | azure-core | spacy | spacy-stanza | stanza

Downloads last day: 1
Downloads last week: 37
Downloads last month: 94