PyPI page
Home page
Author:
Adrien Barbaresi
Summary:
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.
Latest version:
2.1.0
Required dependencies:
certifi
|
charset_normalizer
|
courlan
|
htmldate
|
justext
|
lxml
|
urllib3
Optional dependencies:
beautifulsoup4
|
boilerpy3
|
brotli
|
docutils
|
faust-cchardet
|
goose3
|
html-text
|
html2text
|
htmldate
|
inscriptis
|
mypy
|
news-please
|
newspaper3k
|
pandas
|
py3langid
|
pycurl
|
pydata-sphinx-theme
|
pytest
|
pytest-cov
|
readability-lxml
|
resiliparse
|
ruff
|
sphinx
|
sphinx-sitemap
|
tabulate
|
tqdm
|
types-lxml
|
types-pycurl
|
types-urllib3
|
urllib3
|
zstandard
Downloads last day:
199,310
Downloads last week:
2,248,821
Downloads last month:
8,977,922