PyPI Stats

Search

All packages
Top packages

Track packages

distributed-curator


PyPI page
Home page
Author: Ken Obata
Summary: Partition-aware MinHash LSH deduplication for large-scale text data curation on Apache Spark
Latest version: 0.1.4
Required dependencies: mmh3 | numpy | pyspark
Optional dependencies: boto3 | build | pandas | pytest | ruff | twine

Downloads last day: 15
Downloads last week: 177
Downloads last month: 316