PyPI page
Home page
Author:
Uyen Hoang
Summary:
PDF processing pipeline: remove headers/footers, convert to markdown, and generate image captions
Latest version:
0.1.6
Required dependencies:
absl-py
|
accelerate
|
addict
|
aiofiles
|
aiohappyeyeballs
|
aiohttp
|
aioice
|
aiortc
|
aiosignal
|
alembic
|
annotated-doc
|
annotated-types
|
anthropic
|
antlr4-python3-runtime
|
anyio
|
asttokens
|
async-timeout
|
attrs
|
audioread
|
av
|
babel
|
backports-datetime-fromisoformat
|
backports.asyncio.runner
|
beautifulsoup4
|
blis
|
braceexpand
|
brotli
|
cachetools
|
catalogue
|
certifi
|
cffi
|
cfgv
|
charset-normalizer
|
click
|
cloudpathlib
|
cloudpickle
|
coloredlogs
|
colorlog
|
confection
|
contourpy
|
cryptography
|
csvw
|
ctc_segmentation
|
curated-tokenizers
|
curated-transformers
|
cycler
|
cymem
|
cython
|
cytoolz
|
dacite
|
datasets
|
decorator
|
dill
|
distlib
|
distro
|
dlinfo
|
dnspython
|
docopt
|
editdistance
|
einops
|
einx
|
espeakng-loader
|
exceptiongroup
|
executing
|
fastapi
|
fastrtc
|
fastrtc-moonshine-onnx
|
ffmpy
|
fiddle
|
filelock
|
filetype
|
flatbuffers
|
fonttools
|
frozendict
|
frozenlist
|
fsspec
|
ftfy
|
future
|
gitdb
|
gitpython
|
google-auth
|
google-crc32c
|
google-genai
|
gradio
|
gradio_client
|
graphviz
|
groovy
|
grpcio
|
h11
|
hf-xet
|
httpcore
|
httpx
|
huggingface-hub
|
humanfriendly
|
hydra-core
|
identify
|
idna
|
ifaddr
|
indic_numtowords
|
inflect
|
iniconfig
|
intervaltree
|
ipython
|
isodate
|
jedi
|
jinja2
|
jiter
|
jiwer
|
joblib
|
jsonschema
|
jsonschema-specifications
|
kaldi-python-io
|
kiwisolver
|
language-tags
|
lazy_loader
|
levenshtein
|
lhotse
|
libcst
|
librosa
|
lightning
|
lightning-utilities
|
lilcom
|
llvmlite
|
loguru
|
mako
|
markdown
|
markdown-it-py
|
markdown2
|
markdownify
|
marker-pdf
|
markupsafe
|
marshmallow
|
matplotlib
|
matplotlib-inline
|
mdurl
|
mediapy
|
misaki
|
mistral_common
|
ml_dtypes
|
mlx
|
mlx-audio
|
mlx-lm
|
mlx-metal
|
mlx-vlm
|
more-itertools
|
mpmath
|
msgpack
|
multidict
|
multiprocess
|
murmurhash
|
nemo-toolkit
|
networkx
|
nodeenv
|
num2words
|
numba
|
numpy
|
nv-one-logger-core
|
nv-one-logger-pytorch-lightning-integration
|
nv-one-logger-training-telemetry
|
omegaconf
|
onnx
|
onnxruntime
|
openai
|
opencv-python
|
opencv-python-headless
|
optuna
|
orjson
|
overrides
|
packaging
|
pandas
|
parso
|
pdftext
|
peft
|
pexpect
|
phonemizer-fork
|
pillow
|
plac
|
platformdirs
|
pluggy
|
pooch
|
pre_commit
|
preshed
|
prompt_toolkit
|
propcache
|
protobuf
|
psutil
|
ptyprocess
|
pure_eval
|
pyannote.core
|
pyannote.database
|
pyannote.metrics
|
pyarrow
|
pyasn1
|
pyasn1_modules
|
pybind11
|
pycountry
|
pycparser
|
pydantic
|
pydantic-extra-types
|
pydantic-settings
|
pydantic_core
|
pydub
|
pyee
|
pygments
|
pylibsrtp
|
pyloudnorm
|
pymupdf
|
pyopenssl
|
pyparsing
|
pypdfium2
|
pytest
|
pytest-asyncio
|
python-dateutil
|
python-dotenv
|
python-multipart
|
pytorch-lightning
|
pytz
|
pyyaml
|
rapidfuzz
|
rdflib
|
referencing
|
regex
|
requests
|
resampy
|
rfc3986
|
rich
|
rpds-py
|
rsa
|
ruamel.yaml
|
ruamel.yaml.clib
|
ruff
|
sacremoses
|
safehttpx
|
safetensors
|
scikit-learn
|
scipy
|
segments
|
semantic-version
|
sentencepiece
|
sentry-sdk
|
shellingham
|
six
|
smart_open
|
smmap
|
sniffio
|
sortedcontainers
|
sounddevice
|
soundfile
|
soupsieve
|
sox
|
soxr
|
spacy
|
spacy-curated-transformers
|
spacy-legacy
|
spacy-loggers
|
sqlalchemy
|
srsly
|
stack-data
|
starlette
|
strenum
|
surya-ocr
|
sympy
|
tabulate
|
tenacity
|
tensorboard
|
tensorboard-data-server
|
termcolor
|
text-unidecode
|
texterrors
|
thinc
|
threadpoolctl
|
tiktoken
|
tokenizers
|
toml
|
tomli
|
tomlkit
|
toolz
|
torch
|
torchmetrics
|
tqdm
|
traitlets
|
transformers
|
typeguard
|
typer
|
typer-slim
|
typing-inspection
|
typing_extensions
|
tzdata
|
uritemplate
|
urllib3
|
uvicorn
|
virtualenv
|
wandb
|
wasabi
|
wcwidth
|
weasel
|
webdataset
|
webrtcvad
|
websockets
|
werkzeug
|
wget
|
whisper_normalizer
|
wrapt
|
xxhash
|
yarl
Optional dependencies:
black
|
pytest
Downloads last day:
0
Downloads last week:
189
Downloads last month:
196