Skip to main content
Welcome to the Together AI docs! Together AI makes it easy to run or fine-tune leading open source models with only a few lines of code. We offer a variety of generative AI services:
  • Serverless models - Use our API or playground to run dozens of models with pay as you go pricing.
  • Fine-Tuning - Fine-tune models on your own data in 5 minutes, then run the model for inference.
  • Dedicated endpoints - Run models on your own private GPUs, starting at a one month minimum commitment.
  • GPU Clusters - If youโ€™re interested in private, state of the art clusters with H100 GPUs, contact us.

Quickstart

See our full quickstart for how to get started with our API in 1 minute.
from together import Together
client = Together()

completion = client.chat.completions.create(
  model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
  messages=[{"role": "user", "content": "What are the top 3 things to do in New York?"}],
)

print(completion.choices[0].message.content)

Which model should I use?

Together hosts many popular models via our serverless endpoints. For each of these, youโ€™ll be charged based on the tokens you use and size of the model. Here are all the different types of models that we support: Donโ€™t see a model you want to use? Send us a request to add or upvote the model youโ€™d love to see us add to our serverless infrastructure.

Together Cookbook

See the Together Cookbook โ€“ a collection of notebooks showcasing use cases of open-source models with Together AI. Examples include RAG (text + multimodal), Semantic Search, Rerankers, & Structured JSON extraction.

Example apps

Weโ€™ve built a number of full-stack open source example apps that you can reference. These are production-ready apps have over 500k users & 10k GitHub stars combined โ€“ all fully open source and built on Together AI.
  • LlamaCoder (GitHub) โ€“ an OSS Claude artifacts that is able to generate full React apps from a single prompt. Built on Llama 3.1 405B powered by Together inference.
  • BlinkShot (GitHub) โ€“ a realtime AI image generator using Flux Schnell on Together AI. Type in a prompt and images will get generated as you type.
  • TurboSeek (GitHub) โ€“ an AI search engine inspired by Perplexity. It uses a search API (Serper) along with an LLM (Mixtral) to be able to answer any questions.
  • Napkins.dev (GitHub) โ€“ a wireframe to app tool. It uses Llama 3.2 vision to read in screenshots and write code for them using Llama 3.1 405B.
  • PDFToChat (GitHub) โ€“ a site that lets you chat with your PDFs. Uses RAG with Together embeddings, inference with Llama 3, authentication with Clerk, & MongoDB/Pinecone for the vector database.
  • LlamaTutor (GitHub) โ€“ a personal tutor that can explain any topic at any education level by using a search API along with Llama 3.1.
  • NotesGPT (GitHub) โ€“ an AI note taker that converts your voice notes into organized summaries and clear action items using AI. Uses Together inference (Mixtral) with JSON mode.
  • CareerExplorer (GitHub) โ€“ a site that takes in a resume and suggests career paths based on your strengths and interests. Uses Llama 3 and demonstrates how to parse PDFs and chain multiple calls together.

Next steps

Resources