PyPI Stats

Search

All packages
Top packages

Track packages

dolma


PyPI page
Home page
Author: None
License: Apache-2.0
Summary: Toolkit for pre-processing LLM training data.
Latest version: 1.2.1
Required dependencies: anyascii | blingfire | boto3 | cchardet | charset-normalizer | fasttext-wheel | faust-cchardet | fsspec | jq | jsonpath-ng | msgspec | necessary | nltk | numpy | omegaconf | platformdirs | python-dotenv | pyyaml | requests | rich | s3fs | smart-open | tokenizers | tqdm | uniseg | zstandard
Optional dependencies: beautifulsoup4 | black | brotli | detect-secrets | dolma | fasttext-wheel | fastwarc | flake8 | flake8-pyi | flake8-pyproject | htmldate | ipdb | ipython | isort | lingua-language-detector | mypy | py3langid | pycld2 | pygments | pytest | regex | resiliparse | trafilatura | types-dateparser | types-pyyaml | url-normalize | w3lib

Downloads last day: 118
Downloads last week: 2,073
Downloads last month: 7,654