PyPI page
Home page
Author:
Ícaro Pires
License:
Apache-2.0
Summary:
Easily convert a subdirectory with big volume of PDF documents into a dataset, supports extracting text and images
Latest version:
0.5.3
Required dependencies:
dask
|
more-itertools
|
opencv-python
|
packaging
|
pandas
|
pdf2image
|
pdftotext
|
pyarrow
|
pytesseract
|
ray
|
tqdm
Downloads last day:
1
Downloads last week:
8
Downloads last month:
41