OpenAI Platform
Excerpt
Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI’s platform.
Learn about language model tokenization
OpenAI’s large language models process text using tokens, which are common sequences of characters found in a set of text. The models learn to understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens. Learn more.
You can use the tool below to understand how a piece of text might be tokenized by a language model, and the total count of tokens in that piece of text.
This process began in early February with the introduction of 10000 new grants for
TextToken IDs
A helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens ~= 75 words).
If you need a programmatic interface for tokenizing text, check out our tiktoken package for Python. For JavaScript, the community-supported @dbdq/tiktoken package works with most GPT models.