PyPI page
Home page
Author:
Ken Obata
Summary:
Partition-aware MinHash LSH deduplication for large-scale text data curation on Apache Spark
Latest version:
0.1.4
Required dependencies:
mmh3
|
numpy
|
pyspark
Optional dependencies:
boto3
|
build
|
pandas
|
pytest
|
ruff
|
twine
Downloads last day:
15
Downloads last week:
177
Downloads last month:
316