PyPI page
Home page
Author:
None
Summary:
Tokeniser toolkit: a collection of Pythonic subword tokenisers and text preprocessing tools.
Latest version:
2026.5.1
Required dependencies:
bpeasy
|
clavier_lib
|
dacite
|
datasets
|
evaluate
|
fugashi
|
ipadic
|
langcodes
|
nlpaug
|
pythainlp
|
regex
|
sentencepiece
|
tokenizers
Optional dependencies:
bpe_knockout
|
fiject
|
modest_bauwenst
|
pickybpe_bauwenst
|
sage_bauwenst
|
transformers
|
wtpsplit
Downloads last day:
13
Downloads last week:
152
Downloads last month:
478