PyPI Stats

Search

All packages
Top packages

Track packages

bpetokenizer


PyPI page
Home page
Author: Hrushikesh Dokala
License: MIT
Summary: A Byte Pair Encoding (BPE) tokenizer, which algorithmically follows along the GPT tokenizer(tiktoken), allows you to train your own tokenizer. The tokenizer is capable of handling special tokens and uses a customizable regex pattern for tokenization(includes the gpt4 regex pattern). supports `save` and `load` tokenizers in the `json` and `file` format. The `bpetokenizer` also supports [pretrained](bpetokenizer/pretrained/) tokenizers.
Latest version: 1.2.1
Required dependencies: regex
Optional dependencies: pytest | twine

Downloads last day: 2
Downloads last week: 42
Downloads last month: 138