Skip to main content
POST
/
synthesis
/
text-to-speech
/
stream-async
Streaming async TTS
import requests
import json

# Step 1: Initiate streaming async TTS
url = "https://api.upliftai.org/v1/synthesis/text-to-speech/stream-async"

payload = json.dumps({
  "voiceId": "v_meklc281",
  "text": "سلام، یہ پاکستان کی تاریخ کے بارے میں ہے۔",
  "outputFormat": "MP3_22050_128"
})
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer YOUR_API_KEY'
}

response = requests.post(url, headers=headers, data=payload)
result = response.json()

# Step 2: Stream audio with ~300ms first chunk
media_id = result['mediaId']
token = result['token']

audio_url = f"https://api.upliftai.org/v1/synthesis/stream-audio/{media_id}?token={token}"

# This URL supports chunked streaming
# First chunk arrives in ~300ms
{
  "mediaId": "media_abc123xyz",
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}

Authorizations

Authorization
string
header
required

API key with format "Bearer sk_api_..."

Body

application/json

Request for asynchronous text-to-speech synthesis

text
string
required

The text to synthesize

Maximum length: 2500
Example:

"سلام، آپ اِس وقت اوریٹر کی آواز سن رہے ہیں۔"

outputFormat
enum<string>
required

Format of the output audio. Wav files are usually 10x larger, we recommend using MP3 or OGG for best compression results while maintaining quality.

Available options:
PCM_22050_16,
WAV_22050_16,
WAV_22050_32,
MP3_22050_32,
MP3_22050_64,
MP3_22050_128,
OGG_22050_16,
ULAW_8000_8
voiceId
string

Identifier for the voice to use. Named voices: v_meklc281 (Urdu female), v_8eelc901 (Info/Edu), v_kwmp7zxt (Gen Z), v_yypgzenx (Dada Jee), v_30s70t3a (Nostalgic News)

Example:

"v_meklc281"

phraseReplacementConfigId
string

Optional ID of a phrase replacement configuration to apply

Response

Successfully initiated streaming synthesis

Response containing mediaId and token for retrieving synthesized audio

mediaId
string

Unique identifier for the generated audio media

Example:

"media_abc123xyz"

token
string

JWT token for secure retrieval of the audio

Example:

"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."

I