Converts the provided text to speech audio using the specified voice, response is streamed with “Transfer-Encoding” “chunked” header. We currently aim the p90 first chunk latency to be around 300ms in Pakistan, we are also actively working reducing this.
For best results, we expect you to use Urdu script. To get better pronounciation of English words, use ASCII characters for them. Example “یہ ایک exerted force ہے”
Returns the audio data directly in the response.
API key with format "Bearer sk_api_..."
Request for text-to-speech synthesis
Successful audio synthesis
The response is of type file
.