Tokenizer vocabulary
p50k_base
OpenAI byte-pair encoding used by Codex-era models, extending the GPT-2 vocabulary shape for code. Vocabulary size, token ranges, and special-token IDs are listed here.
OpenAI
Tokenizer vocabulary
OpenAI byte-pair encoding used by Codex-era models, extending the GPT-2 vocabulary shape for code. Vocabulary size, token ranges, and special-token IDs are listed here.
OpenAI