PyPI page
Home page
Author:
Joseph Jennings, Mostofa Patwary, Sandeep Subramanian, Shrimai Prabhumoye, Ayush Dattagupta, Vibhu Jawa, Jiwei Liu, Ryan Wolf
Summary:
Scalable Data Preprocessing Tool for Training Large Language Models
Latest version:
0.4.0
Required dependencies:
awscli
|
beautifulsoup4
|
charset-normalizer
|
comment-parser
|
crossfit
|
cython
|
dask
|
dask-mpi
|
distributed
|
fasttext
|
ftfy
|
in-place
|
jieba
|
justext
|
mwparserfromhell
|
nemo-toolkit
|
numpy
|
openai
|
presidio-analyzer
|
presidio-anonymizer
|
pycld2
|
resiliparse
|
spacy
|
unidic-lite
|
usaddress
|
warcio
|
zstandard
Optional dependencies:
cudf-cu12
|
cugraph-cu12
|
cuml-cu12
|
dask-cuda
|
dask-cudf-cu12
|
spacy
Downloads last day:
0
Downloads last week:
7
Downloads last month:
14