Converts the provided text to speech audio using the specified voice, response is streamed with “Transfer-Encoding” “chunked” header. We currently aim the p90 first chunk latency to be around 300ms in Pakistan, we are also actively working reducing this.
For best results, we expect you to use Urdu script. To get better pronounciation of English words, use ASCII characters for them. Example “یہ ایک exerted force ہے”
Returns the audio data directly in the response.
API key with format "Bearer sk_api_..."
Request for text-to-speech synthesis
Identifier for the voice to use. Options include v_8eelc901 (Info/Edu), v_kwmp7zxt (Gen Z), v_yypgzenx (Dada Jee), v_30s70t3a (Nostalgic News)
v_8eelc901, v_kwmp7zxt, v_yypgzenx, v_30s70t3a The text to synthesize
2500Format of the output audio. Wav files are usually 10x larger, we recommend using MP3 or OGG for best compression results while maintaining quality.
WAV_22050_16, WAV_22050_32, MP3_22050_32, MP3_22050_64, MP3_22050_128, OGG_22050_16, ULAW_8000_8 Optional ID of a phrase replacement configuration to apply
Successful audio synthesis
The response is of type file.