This endpoint initiates streaming text-to-speech synthesis and immediately returns a mediaId and token. Unlike regular async TTS, this enables chunked streaming with ~300ms first chunk latency when retrieved.
When to use this endpoint:
For best results with Urdu, use Urdu script. For English words within Urdu text, use ASCII characters. Example: “یہ ایک exerted force ہے”
The audio streams progressively when retrieved via the /stream-audio endpoint.
API key with format "Bearer sk_api_..."
Request for asynchronous text-to-speech synthesis
Successfully initiated streaming synthesis
Response containing mediaId and token for retrieving synthesized audio