Introduction
Standard large language models respond to user queries by generating plain text. This is great for many applications like chatbots, but if you want to programmatically access details in the response, plain text is hard to work with. Some models have the ability to respond with structured JSON instead, making it easy to work with data from the LLMβs output directly in your application code. If youβre using a supported model, you can enable structured responses by providing your desired schema details to theresponse_format
key of the Chat Completions API.
Supported models
The following models currently support JSON mode:meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
(32K context)meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
meta-llama/Llama-3.2-3B-Instruct-Turbo
meta-llama/Llama-3.3-70B-Instruct-Turbo
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
meta-llama/Llama-4-Scout-17B-16E-Instruct
deepseek-ai/DeepSeek-V3
Qwen/Qwen3-235B-A22B-fp8-tput
Qwen/Qwen2.5-VL-72B-Instruct
Basic example
Letβs look at a simple example, where we pass a transcript of a voice note to a model and ask it to summarize it. We want the summary to have the following structure:response_format
key.
Finally βΒ and this is important β we need to make sure to instruct our model to only respond in JSON format. This ensures it will actually use the schema we provide when generating its response.
Important: You must always instruct your model to only respond in JSON format, either in the system prompt or a user message, in addition to passing your schema to the response_format
key.
Letβs see what this looks like:
Vision model example
Letβs look at another example, this time using a vision model. We want our LLM to extract text from the following screenshot of a Trello board:
Try out your code in the Together Playground
You can try out JSON Mode in the Together Playground to test out variations on your schema and prompt: