PyPI Stats

Search

All packages
Top packages

Track packages

datasetpipeline


PyPI page
Home page
Author: None
License: MIT
Summary: A data processing and analysis pipeline designed to handle various jobs related to data transformation, quality assessment, deduplication, and formatting. The pipeline can be configured and executed using YAML configuration files.
Latest version: 0.2.1
Required dependencies: datasets | faiss-cpu | fuzzywuzzy | huggingface-hub | langchain-community | langchain-core | loguru | numpy | onnxruntime | openai | pandas | pydantic | python-levenshtein | retry | rich | ruamel-yaml | sqlalchemy | typer
Optional dependencies: black | build | faiss-gpu | flake8 | ipykernel | langchain-community | mypy | pytest | pytest-cov | sentence-transformers | tabulate | torch | transformers | twine

Downloads last day: 17
Downloads last week: 144
Downloads last month: 203