PyPI page
Home page
Author:
Zacharie B
License:
Apache-2.0
Summary:
Lightweight data quality toolkit for LLM instruction tuning. Deduplication, PII detection, contamination checking, and quality scoring — no GPU required.
Latest version:
0.4.0
Required dependencies:
xxhash
Optional dependencies:
click
|
datasketch
|
numpy
|
pyarrow
|
rich
|
sentence-transformers
Downloads last day:
5
Downloads last week:
55
Downloads last month:
289