Introducing structured outputs with JSON response format

🪴 Anil's Garden

Today, Cohere is introducing Structured Outputs, a feature that ensures outputs from Cohere’s Command R series of models adhere to a user-defined response format. Structured Outputs will initially support JSON response format, including user-defined JSON schemas, with plans to expand to other structured output formats. With this new capability, Cohere is making generative LLM outputs even more reliable.

Transform data for programmatic usage

Structured Outputs enables developers to reliably and consistently generate model outputs for programmatic usage and reliable function calls. Some examples include:

Extracting data: Enabling the model to generate structured data output enables users to extract the same data from free-form text into a standard JSON object, which can be helpful in storing and comparing the data values, and thus automating the data-to-JSON object conversion process.
Formulating queries: Using Structured Outputs in JSON helps to ensure the model produces valid data that other software can reliably use.
Displaying model outputs in the UI: JSON objects provide developers with maximum control over how they want attributes to be displayed in their UI.

Imagine a customer that needs to extract information such as name, email, education, and location from hundreds or thousands of resumes. For this task, the model needs to generate structured data for every resume. A consistent format ensures that the company’s downstream systems can process this data reliably. With Structured Outputs in JSON, the company can force LLM output to follow a valid schema across each data extraction.

Here’s a sample JSON schema setup:

POST https://api.cohere.ai/v1/chat
{
    "message": "Given the following resume from a job application, generate a JSON object by extracting the following information for the applicant: their email, previous employer, location, number of years of experience, and a list of the skills they possess.\n\n <resume>",
    "response_format": {
        "type": "json_object",
        "schema": {
            "type": "object",
            "properties": {
                "email": {
                    "type": "string"
                },
                "location": {
                    "type": "string"
                },
                "years_of_experience": {
                    "type": "integer"
                }
                },
            "required": [
                    "email",
                    "location",
                    "years_of_experience"
            ]
        }
    }
}

How do Structured Outputs work?

LLMs generate output text auto-regressively one token at a time through a sampling step that converts a probability distribution indicating the likelihood over all tokens into one selected token at each step. In the case of structured output generation, we modify this sampling step to only emit tokens consistent with the prescribed format. We briefly describe how.

For responses that need to adhere to a specific user-defined format as specified in the `response_format` parameter, we construct a finite state machine (FSM) that only accepts token sequences that are consistent with the format. We rely on an optimized version of the Outlines library for reliable parsing and FSM construction.

Specifically, the FSM can be represented as a directed graph where each node represents the currently accepted partial generation, and each outgoing edge from a node represents all possible acceptable tokens from that state consistent with the user provided format.

While there are several open source alternatives available that can help generate structured outputs, our testing showed they all degrade model performance. To circumvent this, we implemented a number of engineering optimizations and were able to construct these FSMs from JSON schemas efficiently and scalably, up to 80x faster than open source alternatives.

During the decoding phase, instead of directly sampling from the probability distribution emitted by the LLM, our sampling strategy uses the FSM to determine the space of valid tokens, and mutates the probability distribution by pinning the likelihood of all invalid tokens to zero. This ensures that the sampler only picks tokens that are accepted by the FSM, and consequently is guaranteed to adhere to the prescribed response format. Due to various system optimizations we implemented, these additional acceptance checks are done efficiently at almost zero overhead over the vanilla sampling strategy.

Get started

To get started using Structured Outputs with JSON mode, check our documentation, or enable JSON Mode in our Playground.

🪴 Anil's Garden

Explorer

Introducing structured outputs with JSON response format

Transform data for programmatic usage

How do Structured Outputs work?

Get started

Graph View

Table of Contents

Backlinks