API Reference

Enhance your app with our ultra-fast text-in, audio-out API. Transform your user experience with the power of voice.

To use our WebSocket, you will need beforehand:

To fully leverage our WebSocket API, the steps are:

  • Send a POST request to https://api.play.ht/api/v4/websocket-auth with Authorization: Bearer <your_api_key> and X-User-Id: <your_user_id> headers
  • Receive a JSON response with a websocket_url field containing the WebSocket URL
  • Connect to the provided websocket_url URL
  • Send TTS commands with the same options as our TTS streaming API, e.g., {"text":"Hello World","voice":"...","output_format":"mp3"}
  • For each request, receive, in order, a {"type":"start"} message, followed by audio output as binary messages, and a {"type":"end"} as indicator of end of request.

ℹ️

Supported Models

Our WebSocket API supports our latest models, Play3.0-mini and PlayDialog (with multilingual support as PlayDialogMultilingual)


Establishing a WebSocket Connection

To establish a WebSocket connection, you will need to send a POST request to the https://api.play.ht/api/v4/websocket-auth endpoint with the following headers:

Authorization: Bearer <your_api_key>
X-User-Id: <your_user_id>
Content-Type: application/json

You can obtain your api_key and user_id from your PlayHT account.

The response will contain a JSON object with a websocket_urls field that you can find the URL of the specific engine/model you want. Use such URL to connect to the WebSocket server.

{
  "websocket_urls": {
    "Play3.0-mini": "wss://ws.fal.run/playht-fal/playht-tts/ws?fal_jwt_token=<your_session_token>",
    "PlayDialog": "wss://ws.fal.run/playht-fal/playht-tts-ldm/ws?fal_jwt_token=<your_session_token>",
    "PlayDialogMultilingual": "wss://ws.fal.run/playht-fal/playht-tts-multilingual-ldm/ws?fal_jwt_token=<your_session_token>"
  },
  "expires_at": "2024-12-11T22:17:37.429Z"
}

For example, for PlayDialog (english only), you can forward the URL at websocket_urls["PlayDialog"] to your WebSocket client to establish a connection, such as in the following example:

const ws = new WebSocket('wss://ws.fal.run/playht-fal/playht-tts-ldm/ws?fal_jwt_token=<your_session_token>');

The URLs are valid up to the date expires_at. If you want longer connections, you can call the websocket-auth endpoint again.


Sending TTS Commands

Once connected to the WebSocket, you can send TTS commands as JSON messages. The structure of these commands is similar to our TTS streaming API. Here's an example:

const ttsCommand = {
  text: "Hello, world! This is a test of the PlayHT TTS WebSocket API.",
  voice: "s3://voice-cloning-zero-shot/775ae416-49bb-4fb6-bd45-740f205d20a1/jennifersaad/manifest.json",
  output_format: "mp3",
  temperature: 0.7
};

ws.send(JSON.stringify(ttsCommand));

Examples of the available options for the TTS command are:

  • text (required): The text to be converted to speech.
  • voice (required): The voice ID or URL to use for synthesis.
  • output_format (optional): The desired audio format (default is "mp3").
  • quality (optional): The quality of the audio ("draft", "standard", or "premium").
  • temperature (optional): Controls the randomness of the generated speech (0.0 to 1.0).
  • speed (optional): The speed of the generated speech (0.5 to 2.0).

For the complete list of parameters, refer to the TTS API documentation.


Receiving Audio Output

Per each message you send in the connection, you will get back:

  1. a JSON message that signifies the START of processing

    • it will include "type": "start"
    • "status": 200 or whatever is the response
    • "request_id": "..." the UUID that identifies the request (useful for correlation with the end message below, as well as for debugging)
    • "headers": {…} the response headers
  2. messages with the actual response data

    • These could be several messages as the response is streamed in chunks
    • they will come in bytes type
  3. a final JSON message that signifies the END of processing

    • it will include "type": "end"
    • "status": 200 or whatever is the response
    • "request_id": "..."

To handle these messages and play the audio, you can use the following approach:

let audioChunks = [];

ws.onmessage = (event) => {
  if (event.data instanceof Blob) {
    // Received binary audio data
    audioChunks.push(event.data);
  } else {
    // Received a text message
    const message = JSON.parse(event.data);
    if (message.type === 'end') {
      // End of audio stream, play the audio
      // If you specified a different output_format, you may need to adjust the audio player logic accordingly
      const audioBlob = new Blob(audioChunks, { type: 'audio/mpeg' });
      const audioUrl = URL.createObjectURL(audioBlob);
      const audio = new Audio(audioUrl);
      audio.play();
      
      // Clear the audio chunks for the next request
      audioChunks = [];
    }
  }
};

This code collects the binary audio chunks as they arrive and combines them into a single audio blob when the
End or Request message ({"type":"end"}) is received. It then creates an audio URL and plays the audio using the Web Audio API.


Error Handling

It's important to implement error handling in your WebSocket client. Here's an example of how to handle errors and connection closures:

ws.onerror = (error) => {
  console.error('WebSocket Error:', error);
};

ws.onclose = (event) => {
  console.log('WebSocket connection closed:', event.code, event.reason);
  // Implement reconnection logic if needed
};

Best Practices

  1. Authentication: Always keep your API key and User ID secure. While the WebSocket URL can be shared with client-side code, the API Key and User ID should be kept private.

  2. Error Handling: Implement robust error handling and reconnection logic in your WebSocket client.

  3. Resource Management: Close the WebSocket connection when it's no longer needed to free up server resources.

  4. Rate Limiting: Be aware of rate limits on the API and implement appropriate throttling in your application.

  5. Testing: Thoroughly test your implementation with various inputs and network conditions to ensure reliability.

By following these guidelines and using the provided examples, you can effectively integrate the PlayHT TTS WebSocket API into your application, enabling real-time text-to-speech functionality with low latency and high performance.