PyPI page
Home page
Author:
None
License:
Apache-2.0
Summary:
HuggingFace library to process and filter large amounts of webdata
Latest version:
0.9.0
Required dependencies:
dill
|
fsspec
|
huggingface-hub
|
humanize
|
loguru
|
multiprocess
|
numpy
|
tqdm
Optional dependencies:
aiofiles
|
aiosqlite
|
bitsandbytes
|
botok
|
datasets
|
datatrove
|
fasteners
|
fasttext-numpy2-wheel
|
faust-cchardet
|
flask
|
ftfy
|
httpx
|
indic-nlp-library
|
inscriptis
|
jieba
|
khmer-nltk
|
kiwipiepy
|
laonlp
|
lighteval
|
moto
|
nltk
|
numpy
|
orjson
|
pandas
|
pyahocorasick
|
pyarrow
|
pyidaungsu-numpy2
|
pytest
|
pytest-rerunfailures
|
pytest-timeout
|
pytest-xdist
|
pythainlp
|
python-magic
|
pyvi
|
pyyaml
|
ray
|
regex
|
rich
|
ruff
|
s3fs
|
sglang
|
spacy
|
stanza
|
tensorflow
|
tldextract
|
tokenizers
|
trafilatura
|
transformers
|
typer
|
urduhack
|
vllm
|
warcio
|
xxhash
|
zstandard
Downloads last day:
1,475
Downloads last week:
9,487
Downloads last month:
51,967