Learn when to use async text-to-speech for optimal performance in your applications.

Choosing the Right Approach

Quick Rule: Use async TTS when you don’t want audio data passing through your server.

Comparison

MethodBest ForResponse TimeHow It Works
Sync
/text-to-speech
Direct playback, small texts~500ms-2s totalReturns complete audio in response
Streaming
/text-to-speech/stream
Real-time playback through your server~300ms first chunkStreams audio chunks through your server
Async
/text-to-speech-async
Bots, webhooks, CDN deliveryInstant (returns URL)Returns URL, complete audio available in 1-2s
Async Streaming
/text-to-speech/stream-async
Frontend streaming without proxyInstant (returns URL)Returns URL, ~300ms first chunk when retrieved

When to Use Each Method

Use Async (/text-to-speech-async):

  • WhatsApp/Telegram bots that need complete audio files
  • Webhook workflows where you process later
  • Batch processing multiple texts

Use Async Streaming (/text-to-speech/stream-async):

  • Frontend apps that want streaming without proxy
  • When you need low latency first-byte delivery (~300ms)
  • Direct client streaming from CDN
  • Real-time playback that starts before full generation

Use Regular Streaming (/text-to-speech/stream):

  • When you need to process audio through your server
  • Adding custom headers or authentication

Use Sync (/text-to-speech):

  • Simple, one-time conversions
  • Small texts with immediate playback

How Async TTS Works

Simple Example

WhatsApp Bot Integration

import requests
import json

def send_voice_to_whatsapp(text: str, phone_number: str):
    # Step 1: Get audio URL from Uplift AI
    response = requests.post(
        "https://api.upliftai.org/v1/synthesis/text-to-speech-async",
        headers={
            'Authorization': 'Bearer YOUR_API_KEY',
            'Content-Type': 'application/json'
        },
        json={
            "voiceId": "v_meklc281",  # Urdu female voice
            "text": text,
            "outputFormat": "MP3_22050_64"  # Smaller for WhatsApp
        }
    )
    
    result = response.json()
    audio_url = f"https://api.upliftai.org/v1/synthesis/stream-audio/{result['mediaId']}?token={result['token']}"
    
    # Step 2: Send URL directly to WhatsApp
    whatsapp_response = requests.post(
        "https://graph.facebook.com/v17.0/YOUR_PHONE_ID/messages",
        headers={'Authorization': 'Bearer WHATSAPP_TOKEN'},
        json={
            "messaging_product": "whatsapp",
            "to": phone_number,
            "type": "audio",
            "audio": {"link": audio_url}  # Direct URL - no download needed!
        }
    )
    
    return whatsapp_response.json()

Key Benefits

No Proxy Needed

Audio goes directly from Uplift AI to your users

Instant Response

Get URL immediately, audio generates in background

Secure Access

JWT tokens ensure only authorized access

CDN Ready

URLs work with any CDN or caching layer

Voice & Format Options

Use the same voice IDs and output formats as regular TTS:

Voices

  • "v_meklc281" - Urdu female voice
  • "v_8eelc901" - Info/Education
  • "v_30s70t3a" - Nostalgic News
  • "v_yypgzenx" - Dada Jee (storytelling)

Output Formats

  • MP3_22050_64 - Best for messaging apps (smaller files)
  • MP3_22050_128 - Best quality/size balance
  • WAV_22050_32 - When you need lossless audio
  • ULAW_8000_8 - For telephony systems

Next Steps