POST
/
audio
/
speech
Create audio generation request
curl --request POST \
  --url https://api.together.xyz/v1/audio/speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "cartesia/sonic",
  "input": "<string>",
  "voice": "laidback woman",
  "response_format": "wav",
  "language": "en",
  "response_encoding": "pcm_f32le",
  "sample_rate": 44100,
  "stream": false
}'
This response does not have an example.

Authorizations

Authorization
string
header
default:default
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
model
required

The name of the model to query.<br> <br> See all of Together AI's chat models

Available options:
cartesia/sonic
Example:

"cartesia/sonic"

input
string
required

Input text to generate the audio for

voice
required

The voice to use for generating the audio. View all supported voices here.

Available options:
laidback woman,
polite man,
storyteller lady,
friendly sidekick
response_format
enum<string>
default:wav

The format of audio output

Available options:
mp3,
wav,
raw
language
enum<string>
default:en

Language of input text

Available options:
en,
de,
fr,
es,
hi,
it,
ja,
ko,
nl,
pl,
pt,
ru,
sv,
tr,
zh
response_encoding
enum<string>
default:pcm_f32le

Audio encoding of response

Available options:
pcm_f32le,
pcm_s16le,
pcm_mulaw,
pcm_alaw
sample_rate
number
default:44100

Sampling rate to use for the output audio

stream
boolean
default:false

If true, output is streamed for several characters at a time instead of waiting for the full response. The stream terminates with data: [DONE]. If false, return the encoded audio as octet stream

Response

OK

The response is of type file.