Running a single query
Usechat.completions.create
to send a single query to a chat model:
create
method takes in a model name and a messages
array. Each message
is an object that has the content of the query, as well as a role for the messageβs author.
In the example above, you can see that weβre using βuserβ for the role. The βuserβ role tells the model that this message comes from the end user of our system β for example, a customer using your chatbot app.
The other two roles are βassistantβ and βsystemβ, which weβll talk about next.
Having a long-running conversation
Every query to a chat model is self-contained. This means that new queries wonβt automatically have access to any queries that may have come before them. This is exactly why the βassistantβ role exists. The βassistantβ role is used to provide historical context for how a model has responded to prior queries. This makes it perfect for building apps that have long-running conversations, like chatbots. To provide a chat history for a new query, pass the previous messages to themessages
array, denoting the user-provided queries with the βuserβ role, and the modelβs responses with the βassistantβ role:
Customizing how the model responds
While you can query a model just by providing a user message, typically youβll want to give your model some context for how youβd like it to respond. For example, if youβre building a chatbot to help your customers with travel plans, you might want to tell your model that it should act like a helpful travel guide. To do this, provide an initial message that uses the βsystemβ role:Streaming responses
Since models can take some time to respond to a query, Togetherβs APIs support streaming back responses in chunks. This lets you display results from each chunk while the model is still running, instead of having to wait for the entire response to finish. To return a stream, set thestream
option to true. (If using HTTP, the option name is stream_tokens
.)
A note on async support in Python
Since I/O in Python is synchronous, multiple queries will execute one after another in sequence, even if they are independent. If you have multiple independent calls that you want to run in parallel, you can use our Python libraryβsAsyncTogether
module: