Choosing the Right Approach
Quick Rule: Use async TTS when you don’t want audio data passing through your server.
Comparison
Method | Best For | Response Time | How It Works |
---|---|---|---|
Sync /text-to-speech | Direct playback, small texts | ~500ms-2s total | Returns complete audio in response |
Streaming /text-to-speech/stream | Real-time playback through your server | ~300ms first chunk | Streams audio chunks through your server |
Async /text-to-speech-async | Bots, webhooks, CDN delivery | Instant (returns URL) | Returns URL, complete audio available in 1-2s |
Async Streaming /text-to-speech/stream-async | Frontend streaming without proxy | Instant (returns URL) | Returns URL, ~300ms first chunk when retrieved |
When to Use Each Method
Use Async (/text-to-speech-async
):
- WhatsApp/Telegram bots that need complete audio files
- Webhook workflows where you process later
- Batch processing multiple texts
Use Async Streaming (/text-to-speech/stream-async
):
- Frontend apps that want streaming without proxy
- When you need low latency first-byte delivery (~300ms)
- Direct client streaming from CDN
- Real-time playback that starts before full generation
Use Regular Streaming (/text-to-speech/stream
):
- When you need to process audio through your server
- Adding custom headers or authentication
Use Sync (/text-to-speech
):
- Simple, one-time conversions
- Small texts with immediate playback
How Async TTS Works
Simple Example
WhatsApp Bot Integration
Key Benefits
No Proxy Needed
Audio goes directly from Uplift AI to your users
Instant Response
Get URL immediately, audio generates in background
Secure Access
JWT tokens ensure only authorized access
CDN Ready
URLs work with any CDN or caching layer
Voice & Format Options
Use the same voice IDs and output formats as regular TTS:Output Formats
MP3_22050_64
- Best for messaging apps (smaller files)MP3_22050_128
- Best quality/size balanceWAV_22050_32
- When you need lossless audioULAW_8000_8
- For telephony systems