This endpoint is used to trigger text to speech conversion.
It can generate text for Standard & Premium (S&P) voices. The identifiers for these voices look like en-US-JennyNeural
.
If you are using PlayHT voices (their identifiers look like larry
or a URL), please refer to the Generate Audio From Text endpoint page.
The request body for this /v1/convert
endpoint must contain the SSML to be converted to speech, along with the voice to be used for the conversion.
The response will contain data about the conversion job created in JSON format.
To generate audio from SSML, please provide the ssml
field in the request body. It is an array of SSML strings. An SSML string looks like: <speak><p>Hello my friend <break time="0.5s"/></p></speak>
.
Use the transcriptionId
in the response to check the conversion status in the Get Article Conversion Status endpoint endpoint.