PyPI page
Home page
Author:
None
License:
CC-BY-NC-SA-4.0
Summary:
data processing pipeline with deduplication, stemming, quality checking, and readability scoring, used for the DALLA Models
Latest version:
0.0.11
Required dependencies:
click
|
datasets
|
pyarrow
|
structlog
|
tqdm
|
transformers
Optional dependencies:
camel-tools
|
cffi
|
dalla-data-processing
|
pre-commit
|
pytest
|
pytest-cov
|
pyyaml
|
ruff
|
sentencepiece
|
textstat
Downloads last day:
11
Downloads last week:
47
Downloads last month:
85