Async Text to Speech
This endpoint initiates text-to-speech synthesis and immediately returns a mediaId and token. The audio is generated asynchronously and can be retrieved using the returned credentials.
When to use this endpoint:
- Bot integrations (WhatsApp, Telegram, etc.) - Avoid audio passing through your system
- Webhook workflows - When you need to process audio generation separately
- Batch processing - When converting multiple texts without blocking
- Direct client delivery - Let clients fetch audio directly using the secure token
For best results with Urdu, use Urdu script. For English words within Urdu text, use ASCII characters. Example: “یہ ایک exerted force ہے”
The generated audio URL can be shared directly with end users or services without proxying through your server.
Authorizations
API key with format "Bearer sk_api_..."
Body
Request for asynchronous text-to-speech synthesis
The text to synthesize
2500"سلام، آپ اِس وقت اوریٹر کی آواز سن رہے ہیں۔"
Format of the output audio. Wav files are usually 10x larger, we recommend using MP3 or OGG for best compression results while maintaining quality.
PCM_22050_16, WAV_22050_16, WAV_22050_32, MP3_22050_32, MP3_22050_64, MP3_22050_128, OGG_22050_16, ULAW_8000_8 Identifier for the voice to use. Named voices: v_meklc281 (Urdu female), v_8eelc901 (Info/Edu), v_kwmp7zxt (Gen Z), v_yypgzenx (Dada Jee), v_30s70t3a (Nostalgic News)
"v_meklc281"
Optional ID of a phrase replacement configuration to apply
Response
Successfully initiated audio synthesis
