Structured Output
How can LLM follow the format defined in structured output?
One good explanation is this youtube video
1 OpenAI API and Outlines lib
OAI uses Pydantic and Outlier use Regex expression.
A finte state machine was maintained for regular express output. You track at which state the tokens are in, and check if the following tokens are valid or not.
The performance is this method are hard to scale in real case. The solution is pre-generate all possibile tokens from this step