Some techniques to guarantee you always get the lowest latency when streaming audio in realtime using the PlayHT API

Use PlayHT Turbo model

To get the lowest latency, you must use our Turbo model.

import * as PlayHT from "playht";

// Initialize PlayHT API with your credentials
PlayHT.init({
  apiKey: "<YOUR_PLAY_HT_API_KEY>",
  userId: "<YOUR_PLAY_HT_USER_ID>",
});

// When using the nodejs SDK, pass the voiceEngine as "PlayHT2.0-turbo"
PlayHT.stream("Hello from a realistic voice.", {voiceEngine: "PlayHT2.0-turbo"});
curl --request POST \
     --url https://api.play.ht/api/v2/tts/stream \
     --header 'AUTHORIZATION: <YOUR_PLAY_HT_API_KEY>' \
     --header 'X-USER-ID: <YOUR_PLAY_HT_USER_ID>' \
     --header 'accept: audio/mpeg' \
     --header 'content-type: application/json' \
     --data '
{
  "text": "Hello from a realistic voice.",
  "voice_engine": "PlayHT2.0-turbo"
}
'

Use the Realtime streaming API not the async non-streaming API

Make sure to use the Realtime streaming methods (in case of SDKs) or endpoint (in case of REST API), the streaming API doesn't need to wait for the full audio buffer to be generated, it will start streaming instantly withing 200-300ms, but the async API is slower as it waits for the full audio to be generated.

Check the Real-time streaming section for more details on how to do that with each SDK or through the REST API.

Use a PlayHT2.0 voice

You must use a PlayHT2.0 voice, those are the only voices supported by our realtime Turbo model. Use the voices endpoint or the listVoices() method in our nodejs SDK to get the full list of prebuilt voices.

All instant clones are supported by the Turbo model, but high-fidelity clones are not yet supported.

Here is an example of a voice object, make sure the voice you are using for streaming has voice_engine: "PlayHT2.0".

  {
    "id": "s3://voice-cloning-zero-shot/801a663f-efd0-4254-98d0-5c175514c3e8/jennifer/manifest.json",
    "name": "Jennifer",
    "sample": "https://peregrine-samples.s3.amazonaws.com/parrot-samples/jennifer.wav",
    "accent": "american",
    "age": "adult",
    "gender": "female",
    "language": "English (US)",
    "language_code": "en-US",
    "loudness": null,
    "style": null,
    "tempo": null,
    "texture": null,
    "is_cloned": false,
    "voice_engine": "PlayHT2.0"
  }

Use our SDKs

Our Voice API server uses gRPC under the hood to achieve the fastest network communication; our nodejs and python SDKs talk directly with the gRPC client without any HTTP proxy in between, which saves some of the network latency you might get by using the REST API, the difference is minimal, but if you want the best latency, we recommend using the SDKs.

Deploy your servers in the US

Our API and GPU servers are deployed in AWS US-East and US-West regions; we recommend you deploy your servers in a US region to save any network cost due to region crossing.

If you absolutely need to run your servers outside the US and that is impacting your latency, you might need to look into our on-prem deployment offering, you will be able to deploy in your own cloud in any region.

Upgrade to our enterprise cluster

If you want to have the lowest latency and highest concurrency all the time with an enterprise-grade SLA, contact us to get access to our enterprise cluster.

Use an on-prem deployment

If you need the absolute lowest latency (~100ms); you might need to look into our on-prem offering which allows you to easily deploy the PlayHT API and models in your own cloud.