PyPI Stats

Search

All packages
Top packages

Track packages

trafilatura

PyPI page
Home page
Author: Adrien Barbaresi
Summary: Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.
Latest version: 2.2.0
Required dependencies: certifi | charset_normalizer | courlan | htmldate | justext | lxml | urllib3
Optional dependencies: beautifulsoup4 | boilerpy3 | brotli | docutils | faust-cchardet | goose3 | html-text | html2text | htmldate | inscriptis | mypy | news-please | newspaper3k | pandas | py3langid | pycurl | pydata-sphinx-theme | pytest | pytest-cov | readability-lxml | resiliparse | ruff | sphinx | sphinx-sitemap | tabulate | tqdm | types-lxml | types-pycurl | types-urllib3 | urllib3 | zstandard

Downloads last day: 505,554
Downloads last week: 3,034,505
Downloads last month: 12,907,693