The pyht
PlayHT Python SDK provides easy access to the PlayHT streaming API
Play3.0-mini Support
The 3.0-mini model is now the default in the Python SDK. You can use either HTTP (voice engine
Play3.0-mini-http
, default) or WebSockets (voice enginePlay3.0-mini-ws
) to work with the 3.0-mini model.To use our most advanced PlayDialog model use
PlayDialog-http
for HTTP andPlayDialog-ws
for Websockets when requiring a model with high emotive speech and dialogues.To use the older 2.0-turbo model (which uses gRPC), use voice engine
PlayHT2.0-turbo
.The 3.0-mini model introduces multilingual capabilities. As described below, you can set the
language
parameter to use other languages. If this parameter is unset, English is used by default.The WebSockets API should have lower latency per request once a connection is established, but initially establishing the connection will include some overhead.
pyht is a Python SDK for PlayHT's AI Text-to-Speech API. PlayHT builds conversational voice AI models for realtime use cases. With pyht, you can easily convert text into high-quality audio streams with humanlike voices.
Currently the library supports only streaming text-to-speech. For the full set of functionalities provided by the PlayHT API such as Voice Cloning, see the PlayHT docs
Features
- Stream text-to-speech in real-time, synchronous or asynchronous.
- Use PlayHT's pre-built voices or create custom voice clones.
- Stream text from LLM, and generate audio stream in real-time.
- Supports WAV, MP3, Mulaw, FLAC, and OGG audio formats as well as raw audio.
- Supports 8KHz, 16KHz, 24KHz, 44.1KHz and 48KHz sample rates.
Requirements
- Python 3.8+
aiohttp
filelock
grpc
requests
websockets
Installation
You can install the pyht SDK using pip:
pip install pyht
Usage
You can use the pyht SDK by creating a Client
instance and calling its tts
method. Here's a simple example:
from pyht import Client
from dotenv import load_dotenv
from pyht.client import TTSOptions
import os
load_dotenv()
client = Client(
user_id=os.getenv("PLAY_HT_USER_ID"),
api_key=os.getenv("PLAY_HT_API_KEY"),
)
options = TTSOptions(voice="s3://voice-cloning-zero-shot/775ae416-49bb-4fb6-bd45-740f205d20a1/jennifersaad/manifest.json")
# Open a file to save the audio
with open("output_jenn.wav", "wb") as audio_file:
for chunk in client.tts("Hi, I'm Jennifer from Play. How can I help you today?", options, voice_engine = 'PlayDialog-http'):
# Write the audio chunk to the file
audio_file.write(chunk)
print("Audio saved as output_jenn.wav")
It is also possible to stream text instead of submitting it as a string all at once:
for chunk in client.stream_tts_input(some_iterable_text_stream, options):
# do something with the audio chunk
print(type(chunk))
An asyncio version of the client is also available:
from pyht import AsyncClient
client = AsyncClient(
user_id=os.getenv("PLAY_HT_USER_ID"),
api_key=os.getenv("PLAY_HT_API_KEY"),
)
options = TTSOptions(voice="s3://voice-cloning-zero-shot/775ae416-49bb-4fb6-bd45-740f205d20a1/jennifersaad/manifest.json")
async for chunk in client.tts("Hi, I'm Jennifer from Play. How can I help you today?", options):
# do something with the audio chunk
print(type(chunk))
The tts
method takes the following arguments:
text
: The text to be converted to speech.- a string or list of strings.
options
: The options to use for the TTS request.- a
TTSOptions
object (see below).
- a
voice_engine
: The voice engine to use for the TTS request.PlayDialog-ws
orPlayDialog-http
: Our latest more advanced model that can generate turn-based dialogues with multiple voices (ws
for Websockets andhttp
for HTTP)Play3.0-mini-http
(default): Our multilingual model, streaming audio over HTTP. (NOTE that it isPlay
notPlayHT
like previous voice engines)Play3.0-mini-ws
: Our multilingual model, streaming audio over WebSockets. (NOTE that it isPlay
notPlayHT
like previous voice engines)PlayHT2.0-turbo
: Our legacy English-only model, streaming audio over gRPC.
TTSOptions
The TTSOptions
class is used to specify the options for the TTS request. It has the following members, with these supported values:
voice
: The voice to use for the TTS request; a string.- A URL pointing to a Play voice manifest file.
format
: The format of the audio to be returned; aFormat
enum value.FORMAT_MP3
(default)FORMAT_WAV
FORMAT_MULAW
FORMAT_FLAC
FORMAT_OGG
FORMAT_RAW
sample_rate
: The sample rate of the audio to be returned; an integer.- 8000
- 16000
- 24000
- 44100
- 48000
quality
: DEPRECATED (use sample rate to adjust audio quality)speed
: The speed of the audio to be returned, a float (default 1.0).seed
: Random seed to use for audio generation, an integer (default None, will be randomly generated).- The following options are inference-time hyperparameters of the text-to-speech model; if unset, the model will use default values chosen by PlayHT.
temperature
: The temperature of the model, a float.top_p
: The top_p of the model, a float.text_guidance
: The text_guidance of the model, a float.voice_guidance
The voice_guidance of the model, a float.style_guidance
(Play3.0-mini-http and Play3.0-mini-ws only): The style_guidance of the model, a float.repetition_penalty
: The repetition_penalty of the model, a float.
disable_stabilization
(PlayHT2.0-turbo only): Disable the audio stabilization process, a boolean (default False).language
(Play3.0-mini-http and Play3.0-mini-ws only): The language of the text to be spoken, aLanguage
enum value or None (default English).AFRIKAANS
ALBANIAN
AMHARIC
ARABIC
BENGALI
BULGARIAN
CATALAN
CROATIAN
CZECH
DANISH
DUTCH
ENGLISH
FRENCH
GALICIAN
GERMAN
GREEK
HEBREW
HINDI
HUNGARIAN
INDONESIAN
ITALIAN
JAPANESE
KOREAN
MALAY
MANDARIN
POLISH
PORTUGUESE
RUSSIAN
SERBIAN
SPANISH
SWEDISH
TAGALOG
THAI
TURKISH
UKRAINIAN
URDU
XHOSA
Command-Line Demo
You can run the provided demo from the command line.
Note: This demo depends on the following packages:
pip install numpy soundfile
python demo/main.py --user $PLAY_HT_USER_ID --key $PLAY_HT_API_KEY --text "Hello from Play!"
To run with the asyncio client, use the --async
flag:
python demo/main.py --user $PLAY_HT_USER_ID --key $PLAY_HT_API_KEY --text "Hello from Play!" --async
To run with the HTTP API, which uses our latest Play3.0-mini model, use the --http
flag:
python demo/main.py --user $PLAY_HT_USER_ID --key $PLAY_HT_API_KEY --text "Hello from Play!" --http
To run with the WebSockets API, which also uses our latest Play3.0-mini model, use the --ws
flag:
python demo/main.py --user $PLAY_HT_USER_ID --key $PLAY_HT_API_KEY --text "Hello from Play!" --ws
The HTTP and WebSockets APIs can also be used with the async client:
python demo/main.py --user $PLAY_HT_USER_ID --key $PLAY_HT_API_KEY --text "Hello from Play!" --http --async
python demo/main.py --user $PLAY_HT_USER_ID --key $PLAY_HT_API_KEY --text "Hello from Play!" --ws --async
Alternatively, you can run the demo in interactive mode:
python demo/main.py --user $PLAY_HT_USER_ID --key $PLAY_HT_API_KEY --interactive
In interactive mode, you can input text lines to generate and play audio on-the-fly. An empty line will exit the interactive session.
Get an API Key
To get started with the pyht SDK, you'll need your API Secret Key and User ID. Follow these steps to obtain them:
-
Access the API Page:
Navigate to the API Access page. -
Generate Your API Secret Key:
- Click the "Generate Secret Key" button under the "Secret Key" section.
- Your API Secret Key will be displayed. Ensure you copy it and store it securely.
-
Locate Your User ID:
Find and copy your User ID, which can be found on the same page under the "User ID" section.
Keep your API Secret Key confidential. It's crucial not to share it with anyone or include it in publicly accessible code repositories.