Documentation Index
Fetch the complete documentation index at: https://togetherai-migration.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Standard large language models respond to user queries by generating plain text. This is great for many applications like chatbots, but if you want to programmatically access details in the response, plain text is hard to work with.
Some models have the ability to respond with structured JSON instead, making it easy to work with data from the LLMβs output directly in your application code.
If youβre using a supported model, you can enable structured responses by providing your desired schema details to the response_format key of the Chat Completions API.
Supported models
The following models currently support JSON mode:
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo(32K context)
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
meta-llama/Llama-3.2-3B-Instruct-Turbo
meta-llama/Llama-3.3-70B-Instruct-Turbo
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
meta-llama/Llama-4-Scout-17B-16E-Instruct
deepseek-ai/DeepSeek-V3
Qwen/Qwen3-235B-A22B-fp8-tput
Qwen/Qwen2.5-VL-72B-Instruct
Basic example
Letβs look at a simple example, where we pass a transcript of a voice note to a model and ask it to summarize it.
We want the summary to have the following structure:
{
title: "A title for the voice note",
summary: "A short one-sentence summary of the voice note",
actionItems: [
"Action item 1",
"Action item 2",
]
}
We can tell our model to use this structure by giving it a JSON Schema definition. Since writing JSON Schema by hand is a bit tedious, weβll use a library to help β Pydantic in Python, and Zod in TypeScript.
Once we have the schema, we can give it to our model using the response_format key.
Finally βΒ and this is important β we need to make sure to instruct our model to only respond in JSON format. This ensures it will actually use the schema we provide when generating its response.
Important: You must always instruct your model to only respond in JSON format, either in the system prompt or a user message, in addition to passing your schema to the response_format key.
Letβs see what this looks like:
import json
import together
from pydantic import BaseModel, Field
client = together.Together()
# Define the schema for the output
class VoiceNote(BaseModel):
title: str = Field(description="A title for the voice note")
summary: str = Field(description="A short one sentence summary of the voice note.")
actionItems: list[str] = Field(
description="A list of action items from the voice note"
)
def main():
transcript = (
"Good morning! It's 7:00 AM, and I'm just waking up. Today is going to be a busy day, "
"so let's get started. First, I need to make a quick breakfast. I think I'll have some "
"scrambled eggs and toast with a cup of coffee. While I'm cooking, I'll also check my "
"emails to see if there's anything urgent."
)
# Call the LLM with the JSON schema
extract = client.chat.completions.create(
messages=[
{
"role": "system",
"content": "The following is a voice message transcript. Only answer in JSON.",
},
{
"role": "user",
"content": transcript,
},
],
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
response_format={
"type": "json_object",
"schema": VoiceNote.model_json_schema(),
},
)
output = json.loads(extract.choices[0].message.content)
print(json.dumps(output, indent=2))
return output
main()
If we try it out, our model responds with the following:
{
"title": "Morning Routine",
"summary": "Starting the day with a quick breakfast and checking emails",
"actionItems": [
"Cook scrambled eggs and toast",
"Brew a cup of coffee",
"Check emails for urgent messages"
]
}
Pretty neat!
Our model has generated a summary of the userβs transcript using the schema we gave it.
Vision model example
Letβs look at another example, this time using a vision model.
We want our LLM to extract text from the following screenshot of a Trello board:
In particular, we want to know the name of the project (Project A), and the number of columns in the board (4).
Letβs try it out:
import json
import together
from pydantic import BaseModel, Field
client = together.Together()
# Define the schema for the output
class ImageDescription(BaseModel):
project_name: str = Field(description="The name of the project shown in the image")
col_num: int = Field(description="The number of columns in the board")
def main():
imageUrl = "https://napkinsdev.s3.us-east-1.amazonaws.com/next-s3-uploads/d96a3145-472d-423a-8b79-bca3ad7978dd/trello-board.png"
# Call the LLM with the JSON schema
extract = client.chat.completions.create(
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Extract a JSON object from the image."},
{
"type": "image_url",
"image_url": {
"url": imageUrl,
},
},
],
},
],
model="Qwen/Qwen2.5-VL-72B-Instruct",
response_format={
"type": "json_object",
"schema": ImageDescription.model_json_schema(),
},
)
output = json.loads(extract.choices[0].message.content)
print(json.dumps(output, indent=2))
return output
main()
If we run it, we get the following output:
{
projectName: 'Project A',
columnCount: 4
}
JSON mode has worked perfectly alongside Qwenβs vision model to help us extract structured text from an image!
Try out your code in the Together Playground
You can try out JSON Mode in the Together Playground to test out variations on your schema and prompt:
Just click the RESPONSE FORMAT dropdown in the right-hand sidebar, choose JSON, and upload your schema!