PyPI page
Home page
Author:
None
Summary:
Scalable Data Preprocessing Tool for Training Large Language Models
Latest version:
1.1.0
Required dependencies:
absl-py
|
comment_parser
|
cosmos-xenna
|
flash-attn
|
fsspec
|
hydra-core
|
jieba
|
loguru
|
mecab-python3
|
omegaconf
|
openai
|
pandas
|
pyarrow
|
pynvvideocodec
|
ray
|
torch
|
transformers
|
vllm
Optional dependencies:
av
|
beautifulsoup4
|
cudf-cu12
|
cuml-cu12
|
cvcuda_cu12
|
easydict
|
einops
|
fasttext
|
ftfy
|
gpustat
|
justext
|
lxml
|
mwparserfromhell
|
nemo_curator
|
nemo_toolkit
|
nvidia-dali-cuda120
|
nvidia-ml-py
|
opencv-python
|
peft
|
pycld2
|
pycuda
|
pylibcugraph-cu12
|
pylibraft-cu12
|
raft-dask-cu12
|
rapidsmpf-cu12
|
resiliparse
|
s5cmd
|
scikit-learn
|
sentencepiece
|
torch
|
torchaudio
|
torchvision
|
trafilatura
|
warcio
Downloads last day:
20
Downloads last week:
413
Downloads last month:
3,215