PyPI page
Home page
Author:
Alexander Wettig
Summary:
Library and scripts for common LM data utilities (tokenizing, splitting, packing, ...)
Latest version:
0.5
Required dependencies:
fsspec
|
mosaicml-streaming
|
numpy
|
pyarrow
|
sentencepiece
|
simple_parsing
|
tqdm
|
universal-pathlib
|
zstandard
Optional dependencies:
datasets
Downloads last day:
10
Downloads last week:
22
Downloads last month:
53