Integrating PlayHT Voices with Twilio

Incorporating PlayHT realistic voices into your Twilio application can significantly enhance user experience. To successfully stream PlayHT voices to Twilio over a bi-directional stream, you just need to adhere to these specifications:

  • Output Format: The audio must be in the 'mulaw' format.
  • Sample Rate: Use a sample rate of 8000 Hz.
  • Encoding: Encode the audio in base64 format before sending it to Twilio through a 'media' event message.


Make sure you use the TwiML verb to create a bi-directional stream

Implementation example with the Nodejs SDK

Here is an example using the Nodejs SDK for Audio Streaming:

import * as PlayHT from 'playht';


// Assuming ws is a WebSocket connected to twilio and streamSid contains the stream identifier
function streamGeneratedAudio(ws, streamSid) {
	const streamFromStream = await'Hello from realistic voices ready for the phone.', {
  	voiceEngine: 'PlayHT2.0-turbo',
	  voiceId: 's3://peregrine-voices/oliver_narrative2_parrot_saad/manifest.json',
 	 outputFormat: 'mulaw',
 	 sampleRate: 8000,

	streamFromStream.on('data', (data) => {
  	const message = JSON.stringify({
    	event: 'media',
 			media: {
  	    payload: data.toString('base64'),