Tokenizer vocabulary

p50k_base

OpenAI byte-pair encoding used by Codex-era models, extending the GPT-2 vocabulary shape for code. Vocabulary size, token ranges, and special-token IDs are listed here.

OpenAI

Creator
OpenAI
Mergeable tokens
50,280
Total known tokens
50,281

Browse by type

Open token index