PyPI Stats

Search

All packages
Top packages

Track packages

nemo-curator


PyPI page
Home page
Author: None
Summary: Scalable Data Preprocessing Tool for Training Large Language Models
Latest version: 1.1.0
Required dependencies: absl-py | comment_parser | cosmos-xenna | flash-attn | fsspec | hydra-core | jieba | loguru | mecab-python3 | omegaconf | openai | pandas | pyarrow | pynvvideocodec | ray | torch | transformers | vllm
Optional dependencies: av | beautifulsoup4 | cudf-cu12 | cuml-cu12 | cvcuda_cu12 | easydict | einops | fasttext | ftfy | gpustat | justext | lxml | mwparserfromhell | nemo_curator | nemo_toolkit | nvidia-dali-cuda120 | nvidia-ml-py | opencv-python | peft | pycld2 | pycuda | pylibcugraph-cu12 | pylibraft-cu12 | raft-dask-cu12 | rapidsmpf-cu12 | resiliparse | s5cmd | scikit-learn | sentencepiece | torch | torchaudio | torchvision | trafilatura | warcio

Downloads last day: 20
Downloads last week: 413
Downloads last month: 3,215