js-tiktoken
js-tiktoken is a JavaScript vesrion of the
BPE tokenizer created by OpenAI.tiktoken to estimate tokens used using @[TokenTextSplitter]. It will probably be more accurate for OpenAI mdoels.
- How the text is split: by character passed in.
- How the chunk size is measured: by
tiktokentokenizer.
tiktoken, pass in an encodingName (e.g. cl100k_base) when initializing the @[TokenTextSplitter]. Note that splits from this method can be larger than the chunk size measured by the tiktoken tokenizer.