PyPI page
Home page
Author:
None
License:
Apache 2.0
Summary:
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.
Latest version:
2.0.0
Required dependencies:
cchardet
|
certifi
|
charset_normalizer
|
courlan
|
faust-cchardet
|
htmldate
|
justext
|
lxml
|
urllib3
Optional dependencies:
brotli
|
flake8
|
htmldate
|
mypy
|
py3langid
|
pycurl
|
pytest
|
pytest-cov
|
types-lxml
|
types-urllib3
|
urllib3
|
zstandard
Downloads last day:
21,458
Downloads last week:
108,335
Downloads last month:
1,088,540