Structured Output

less than 1 minute read

How can LLM follow the format defined in structured output?

One good explanation is this youtube video

1 OpenAI API and Outlines lib

OAI uses Pydantic and Outlier use Regex expression. Alt text

A finte state machine was maintained for regular express output. You track at which state the tokens are in, and check if the following tokens are valid or not. Alt text

The performance is this method are hard to scale in real case. The solution is pre-generate all possibile tokens from this step


