PlayHT On-Prem allows you to deploy PlayHT’s best in class enterprise text-to-speech AI directly in your own cloud. This offers two big benefits:

Latency: <130ms response time
Security and Control: Your text and speech data stays on-prem in your cloud. You are in control of your data. PlayHT never sees your data.

If you’re building real-time speech applications (e.g. conversational AI), or are handling sensitive data (e.g. health, financial, legal, PII) you should consider deploying on-prem.

To get started using PlayHT On-Prem, please schedule an onboarding session.

(estimated time: 8 min)

How it works

PlayHT On-Prem is deployed to your cloud in the form of a virtual appliance. Your PlayHT On-Prem appliance contains all the cloud infrastructure and software necessary to turn text into speech. Because your appliance lives entirely in your cloud, it can be as close to your software stack as possible - thereby minimizing latency to <130ms.

Inside your appliance

At the highest level, your appliance consists of two virtual private clouds (VPC): an Isolated VPC and a Control VPC.

The Isolated VPC is network-isolated from the outside world with a standard cloud firewall. This is where PlayHT’s text-to-speech AI runs on GPUs. This network isolation ensures that your text/speech data never leaves your cloud. The only way data can flow into or out of your Isolated VPC is through your Control VPC - which enforces rules about network traffic and which you fully control. This is how PlayHT On-Prem provides security and control.

The Control VPC contains:

Network Proxy: Polices network traffic into and out of the Isolated VPC. Outbound traffic is only allowed if it adheres to a strict security policy, and all outbound network traffic is sent to the Audit Logger.
Audit Logger: Appends all outbound network traffic leaving your appliance to an audit log (e.g. Google Cloud Logging, AWS CloudWatch).
Capacity Manager: Controls software updates, and auto-scaling your appliance to have the right GPU capacity to meet demand.

After you set up your appliance, PlayHT’s text-to-speech AI is downloaded and deployed on GPUs in your Isolated VPC. You’ll receive a custom endpoint for your appliance at {your-company-name}.on-prem.play.ht that you can send regular API requests using the PlayHT API SDK. Your appliance’s endpoint behaves just like PlayHT’s public endpoint.

Your appliance periodically sends performance and usage telemetry back to PlayHT’s cloud. PlayHT uses that telemetry to manage capacity in your appliance (scaling GPUs up and down) and for billing. As described above, network traffic leaving your appliance always goes into your audit log. So: your engineering/security team can examine what your appliance is doing at any time.

Questions

How do I set up PlayHT On-Prem?

It takes less than one hour to set up a PlayHT On-Prem appliance in your cloud.

Here are the instructions depending upon your cloud provider:

For Amazon Web Services: PlayHT On-Prem: Getting started in AWS
For Google Cloud: PlayHT On-Prem: Getting started in Google Cloud
For private datacenters: Coming soon!

After you finish setup, your PlayHT On-Prem appliance will be available at {your-company-name}.on-prem.play.ht You can use our SDK to send requests to your appliance; just configure it with your appliance’s endpoint and start sending requests with your API key. Your appliance’s endpoint behaves just like PlayHT’s public endpoint.

How are my API requests processed?

When you send an API request to {your-company-name}.on-prem.play.ht, the network proxy in your Control VPC receives your request and forwards it to our AI models running in your Isolated VPC. Our models convert your text into speech, and respond to you back via the network proxy. Your text and speech data never leave your appliance except en-route back to you.

What latency can I expect?

Most customers will experience 130ms of time-to-first-audio latency when using A10 or L4 GPUs. Our SLA guarantees time-to-first-audio latency at 200ms.

Does the appliance scale to meet my demand?

Yes.

Your PlayHT On-Prem appliance scales up and down to have enough capacity to meet our SLA given your workload. The capacity manager in your Control VPC sends usage and performance telemetry back to PlayHT, which we use to calculate the right number of GPUs for your workload. Regardless of your workload, your appliance will always have a minimum of 1 GPU provisioned. Reacting to changes in your workload’s demand profile can take several minutes - which may not be fast enough for you. If you’d like to configure a higher minimum number of GPUs, please reach out to [email protected].

Can I control hardware costs?

Yes.

Your PlayHT On-Prem appliance starts with a quota of 50 GPUs. If you’d like to change your quota - please reach out to [email protected].

More generally, your appliance runs in your cloud on hardware you own and control. So: the existing mechanisms you have to control cloud costs continue to work as-is. For example, you can use your cloud provider’s native quota/cost-control features to limit hardware costs.

What if I can’t get GPU capacity from my cloud provider?

Please reach out to [email protected]. We can assist in the conversation with your cloud provider to get more GPU capacity.

Can I see or download PlayHT’s AI models?

No.

Our AI models run on locked-down, GPU-equipped virtual machines inside the Isolated VPC in your cloud. You cannot log on to those machines or inspect the memory/disk contents.

How can I make sure that my text/speech data isn’t leaving my cloud?

Your appliance runs in your cloud in VPCs that you own and control. You have full access to the VPCs, so you can inspect network configuration, firewall rules, and analyze traffic flow logs. You also have full access to the components running inside the Control VPC (which are all open source), so you can verify that it’s enforcing policy correctly. As described above, your appliance sends performance and usage telemetry back to PlayHT’s cloud. It also sends requests to our model repository to download our most recently trained AI models. All network traffic leaving your appliance is added to your audit log. The only exception is text/speech data sent to your own API clients.

What happens if I change my appliance’s cloud configuration by accident? For example, what happens if I delete a virtual machine from the Control VPC?

It's important that you are in control of your infra. But changing your appliance’s cloud configuration could have unpredictable results. If you do this by accident, and your appliance stops working - please contact [email protected].