This endpoint is used to trigger text to speech conversion.

It can generate text for Standard & Premium (S&P) voices. The identifiers for these voices look like en-US-JennyNeural.

If you are using PlayHT voices (their identifiers look like larry or a URL), please refer to the Generate Audio From Text endpoint page.

The request body for this /v1/convert endpoint must contain the SSML to be converted to speech, along with the voice to be used for the conversion.
The response will contain data about the conversion job created in JSON format.

To generate audio from SSML, please provide the ssml field in the request body. It is an array of SSML strings. An SSML string looks like: <speak><p>Hello my friend <break time="0.5s"/></p></speak>.

Use the transcriptionId in the response to check the conversion status in the Get Article Conversion Status endpoint endpoint.

Click Try It! to start a request and see the response here!