API Reference

Streams the audio bytes with out ultra-fast text-in, audio-out API.

ℹ️

Turbo Model

For our newest PlayDialog-turbo engine powered by Groq, see the Turbo get started guide. It uses the same endpoint but has accepts different voice inputs.

Our HTTP streaming endpoint allows you to send text and receive audio bytes in real-time. You can also use our Python SDK or Nodejs SDK streaming.

Get your Credentials

To use the HTTP API you will need an API Key and a User Id, you can easily generate those, check this guide for a how-to.

Example

For code examples, see the interactive code snippets to the right. The provided examples will return an audio buffer stream that you can use to save locally or stream over the network to a browser, app, or telephony system.

For the complete list of supported parameters, see below.

Log in to see full request history
Body Params
string
required
Defaults to Hello Play!

The text to be converted to speech. Limited to 20k characters when voice_engine is set to Play3.0-mini, or 2000 characters otherwise.

string
required
Defaults to s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json

The unique ID for a PlayHT or Cloned Voice.

string

The quality level of the audio. Not supported when voice_engine is set to PlayDialog-turbo.

string | null
Defaults to wav

The format for the output audio. Note that PlayHT1.0 engine voices and JSON output format only support 'mp3' and 'mulaw'. Currently, PlayDialog-turbo only supports wav.

number
0.1 to 5

Control how fast the generated audio should be. A number greater than 0 and less than or equal to 5.0

number
8000 to 48000

The sample rate of the audio in Hz. Not supported when voice_engine is set to PlayDialog-turbo.

number | null
≥ 0

A seed value for reproducible results. Not supported when voice_engine is set to PlayDialog-turbo.

number | null
0 to 2

Controls the randomness of the output. Not supported when voice_engine is set to PlayDialog-turbo.

string | null
Defaults to PlayDialog

The voice engine used to synthesize the voice.

string | null

An emotion to be applied to the speech. Only supported when voice_engine is set to Play3.0-mini, PlayHT2.0 or PlayHT2.0-turbo, and voice uses that engine.

number | null
1 to 6

A number between 1 and 6. Use lower numbers to reduce how unique your chosen voice will be compared to other voices. Higher numbers will maximize its individuality. Only supported when voice_engine is set to Play3.0-mini, PlayHT2.0 or PlayHT2.0-turbo, and voice uses that engine.

number | null
1 to 30

A number between 1 and 30. Use lower numbers to to reduce how strong your chosen emotion will be. Higher numbers will create a very emotional performance. Only supported when voice_engine is set to Play3.0-mini, PlayHT2.0 or PlayHT2.0-turbo, and voice uses that engine.

number | null
1 to 2

A number between 1 and 2. This number influences how closely the generated speech adheres to the input text. Use lower values to create more fluid speech, but with a higher chance of deviating from the input text. Higher numbers will make the generated speech more accurate to the input text, ensuring that the words spoken align closely with the provided text. Only supported when voice_engine is set to Play3.0-mini or PlayHT2.0, and voice uses that engine.

string | null

The language of the voice. Only supported when voice_engine is set to Play3.0-mini or PlayDialog, and voice has 2.0 engine or later. When voice_engine is set to PlayDialog-turbo, only english and arabic are supported.

string | null

Only supported when voice_engine is set to PlayDialog. The unique ID for a Voice to be used as second character on multi-turn dialogue generations.

string | null

Only supported when voice_engine is set to PlayDialog. The prefix to indicate the start of a turn in a multi-turn dialogue with voice.

string | null

Only supported when voice_engine is set to PlayDialog. The prefix to indicate the start of a turn in a multi-turn dialogue with voice_2.

number | null

Only supported when voice_engine is set to PlayDialog. The number of seconds of conditioning to use from the selected voice. Lower values generate audio less similar to the cloned voice, but lead to more model stability and expressiveness. Higher values create output more similar to the cloned voice, but can lead to model instability and reduced expressiveness. Defaults to 20.

number | null

Only supported when voice_engine is set to PlayDialog. The number of seconds of conditioning to use from the selected voice_2. Lower values generate audio less similar to the cloned voice, but lead to more model stability and expressiveness. Higher values create output more similar to the cloned voice, but can lead to model instability and reduced expressiveness. Defaults to 20.

Responses

Content-Type header of your request and ensure that it is set to a supported media type. You may also need to update your request payload to match the expected media type for the resource you are trying to access.

Language
Credentials
Click Try It! to start a request and see the response here! Or choose an example:
*/*
application/json