pdf2dataset

PyPI page
Home page
Author: Ícaro Pires
License: Apache-2.0
Summary: Easily convert a subdirectory with big volume of PDF documents into a dataset, supports extracting text and images
Latest version: 0.5.3
Required dependencies: dask | more-itertools | opencv-python | packaging | pandas | pdf2image | pdftotext | pyarrow | pytesseract | ray | tqdm

Downloads last day: 2
Downloads last week: 22
Downloads last month: 329