Streaming with Twilio
Integrating Play3.0 TTS with Twilio for Phone-Based AI Interactions
This guide will walk you through the process of integrating PlayHT's Play3.0
Text-to-Speech (TTS) voice model with Twilio to create a phone-based AI interaction system. We'll use ChatGPT for generating responses, but you can adapt this to work with other LLMs.
Prerequisites
- Node.js installed on your system
- An OpenAI API key
- A PlayHT API key and User ID
- A Twilio account with a phone number
- ngrok for exposing your local server to the internet (for development)
Step 1: Set Up Your Project
-
Create a new directory for your project and navigate to it:
mkdir twilio-playht-ai-phone cd twilio-playht-ai-phone
-
Initialize a new Node.js project:
npm init -y
-
Open the
package.json
file and add the following line to enable ES modules:{ ... "type": "module", ... }
-
Install the required dependencies:
npm install openai playht dotenv express twilio axios
Step 2: Set Up Environment Variables
Create a .env
file in your project root and add your API keys to the .env
file:
OPENAI_API_KEY=your_openai_api_key_here
PLAYHT_API_KEY=your_playht_api_key_here
PLAYHT_USER_ID=your_playht_user_id_here
TWILIO_ACCOUNT_SID=your_twilio_account_sid_here
TWILIO_AUTH_TOKEN=your_twilio_auth_token_here
Step 3: Create the AI Response Generation Function
Create a file named generateAIResponse.js
with the following content:
import OpenAI from 'openai';
import dotenv from 'dotenv';
dotenv.config();
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
export async function generateAIResponse(prompt) {
try {
const completion = await openai.chat.completions.create({
messages: [{ role: "user", content: prompt }],
model: "gpt-3.5-turbo",
});
return completion.choices[0].message.content;
} catch (error) {
console.error('Error generating AI response:', error);
return "I'm sorry, I couldn't generate a response at this time.";
}
}
Step 4: Create the Text-to-Speech Function
Create a file named textToSpeech.js
with the following content:
import * as PlayHT from 'playht';
import dotenv from 'dotenv';
dotenv.config();
PlayHT.init({
apiKey: process.env.PLAYHT_API_KEY,
userId: process.env.PLAYHT_USER_ID,
});
export async function textToSpeech(text) {
try {
const response = await PlayHT.generate(text, {
voiceId: "s3://voice-cloning-zero-shot/801a663f-efd0-4254-98d0-5c175514c3e8/jennifer/manifest.json",
voiceEngine: "Play3.0",
outputFormat: 'mulaw',
sampleRate: 8000,
});
return response.audioUrl;
} catch (error) {
console.error('Error generating speech:', error);
throw error;
}
}
Step 5: Create the Main Application
Create a file named index.js
with the following content:
import express from 'express';
import dotenv from 'dotenv';
import twilio from 'twilio';
import { generateAIResponse } from './generateAIResponse.js';
import { textToSpeech } from './textToSpeech.js';
dotenv.config();
const app = express();
app.use(express.urlencoded({ extended: true }));
const twilioClient = twilio(process.env.TWILIO_ACCOUNT_SID, process.env.TWILIO_AUTH_TOKEN);
app.post('/voice', async (req, res) => {
const twiml = new twilio.twiml.VoiceResponse();
twiml.say('Welcome to the AI Phone Assistant. Please speak after the beep.');
twiml.record({
action: '/process-speech',
maxLength: 30,
transcribe: true,
transcribeCallback: '/process-speech'
});
res.type('text/xml');
res.send(twiml.toString());
});
app.post('/process-speech', async (req, res) => {
const twiml = new twilio.twiml.VoiceResponse();
if (req.body.TranscriptionText) {
const aiResponse = await generateAIResponse(req.body.TranscriptionText);
const audioUrl = await textToSpeech(aiResponse);
twiml.play(audioUrl);
twiml.say('Thank you for using the AI Phone Assistant. Goodbye!');
twiml.hangup();
} else {
twiml.say("I'm sorry, I couldn't understand that. Please try again.");
twiml.redirect('/voice');
}
res.type('text/xml');
res.send(twiml.toString());
});
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
console.log(`Server is running on port ${PORT}`);
});
Step 6: Set Up ngrok
-
Install ngrok globally:
npm install -g ngrok
-
Start your Express server:
node index.js
-
In a new terminal window, start ngrok:
ngrok http 3000
-
Note the HTTPS URL provided by ngrok (e.g.,
https://your-ngrok-subdomain.ngrok.io
).
Step 7: Configure Twilio
- Log in to your Twilio account.
- Navigate to the Phone Numbers section and select your Twilio phone number.
- In the Voice & Fax section, set the "A Call Comes In" webhook to:
- Webhook:
https://your-ngrok-subdomain.ngrok.io/voice
- HTTP Method: POST
- Webhook:
Step 8: Test Your Integration
- Call your Twilio phone number.
- After the greeting, speak your question or prompt.
- You should receive an AI-generated response spoken using the Play3.0 TTS voice.
Customization
- To use a different voice, change the
voiceId
in thetextToSpeech.js
file. - To use a different LLM, modify the
generateAIResponse
function ingenerateAIResponse.js
.
Conclusion
You've now successfully integrated PlayHT's Play3.0 TTS voice model with Twilio for phone-based AI interactions. This setup allows callers to interact with an AI system over the phone, with responses generated by ChatGPT and spoken using PlayHT's TTS.
Updated about 1 month ago