This endpoint initiates streaming text-to-speech synthesis and immediately returns a mediaId and token. Unlike regular async TTS, this enables chunked streaming with ~300ms first chunk latency when retrieved.
When to use this endpoint:
For best results with Urdu, use Urdu script. For English words within Urdu text, use ASCII characters. Example: “یہ ایک exerted force ہے”
The audio streams progressively when retrieved via the /stream-audio endpoint.
API key with format "Bearer sk_api_..."
Request for asynchronous text-to-speech synthesis
The text to synthesize
2500"سلام، آپ اِس وقت اوریٹر کی آواز سن رہے ہیں۔"
Format of the output audio. Wav files are usually 10x larger, we recommend using MP3 or OGG for best compression results while maintaining quality.
PCM_22050_16, WAV_22050_16, WAV_22050_32, MP3_22050_32, MP3_22050_64, MP3_22050_128, OGG_22050_16, ULAW_8000_8 Identifier for the voice to use. Named voices: v_meklc281 (Urdu female), v_8eelc901 (Info/Edu), v_kwmp7zxt (Gen Z), v_yypgzenx (Dada Jee), v_30s70t3a (Nostalgic News)
"v_meklc281"
Optional ID of a phrase replacement configuration to apply
Successfully initiated streaming synthesis