Token category
o200k_base Multilingual
Multilingual text often splits into script-specific subwords, punctuation, and byte fragments. The same sentence can tokenize very differently across languages and writing systems.
Loading tokens...
Token category
Multilingual text often splits into script-specific subwords, punctuation, and byte fragments. The same sentence can tokenize very differently across languages and writing systems.
Loading tokens...