Streaming async TTS

import requests
import json

# Step 1: Initiate streaming async TTS
url = "https://api.upliftai.org/v1/synthesis/text-to-speech/stream-async"

payload = json.dumps({
  "voiceId": "v_meklc281",
  "text": "سلام، یہ پاکستان کی تاریخ کے بارے میں ہے۔",
  "outputFormat": "MP3_22050_128"
})
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer YOUR_API_KEY'
}

response = requests.post(url, headers=headers, data=payload)
result = response.json()

# Step 2: Stream audio with ~300ms first chunk
media_id = result['mediaId']
token = result['token']

audio_url = f"https://api.upliftai.org/v1/synthesis/stream-audio/{media_id}?token={token}"

# This URL supports chunked streaming
# First chunk arrives in ~300ms

{
  "mediaId": "media_abc123xyz",
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}

Orator API Endpoints

Async Streaming Text to Speech

Asynchronous streaming text-to-speech for direct client delivery

This endpoint initiates streaming text-to-speech synthesis and immediately returns a mediaId and token. Unlike regular async TTS, this enables chunked streaming with ~300ms first chunk latency when retrieved.

When to use this endpoint:

Frontend streaming - Stream audio directly to browsers without proxy
Low-latency playback - Start playing audio before full generation completes
CDN streaming - Progressive download through content delivery networks
Mobile apps - Reduce initial buffering time

For best results with Urdu, use Urdu script. For English words within Urdu text, use ASCII characters. Example: “یہ ایک exerted force ہے”

The audio streams progressively when retrieved via the /stream-audio endpoint.

POST

synthesis

text-to-speech

stream-async

Streaming async TTS

import requests
import json

# Step 1: Initiate streaming async TTS
url = "https://api.upliftai.org/v1/synthesis/text-to-speech/stream-async"

payload = json.dumps({
  "voiceId": "v_meklc281",
  "text": "سلام، یہ پاکستان کی تاریخ کے بارے میں ہے۔",
  "outputFormat": "MP3_22050_128"
})
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer YOUR_API_KEY'
}

response = requests.post(url, headers=headers, data=payload)
result = response.json()

# Step 2: Stream audio with ~300ms first chunk
media_id = result['mediaId']
token = result['token']

audio_url = f"https://api.upliftai.org/v1/synthesis/stream-audio/{media_id}?token={token}"

# This URL supports chunked streaming
# First chunk arrives in ~300ms

{
  "mediaId": "media_abc123xyz",
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}

Authorizations

Authorization

string

header

required

API key with format "Bearer sk_api_..."

Body

application/json

Request for asynchronous text-to-speech synthesis

text

string

required

The text to synthesize

Maximum length: 2500

Example:

"سلام، آپ اِس وقت اوریٹر کی آواز سن رہے ہیں۔"

outputFormat

enum<string>

required

Format of the output audio. Wav files are usually 10x larger, we recommend using MP3 or OGG for best compression results while maintaining quality.

Available options:

PCM_22050_16,

WAV_22050_16,

WAV_22050_32,

MP3_22050_32,

MP3_22050_64,

MP3_22050_128,

OGG_22050_16,

ULAW_8000_8

voiceId

string

Identifier for the voice to use. Named voices: v_meklc281 (Urdu female), v_8eelc901 (Info/Edu), v_kwmp7zxt (Gen Z), v_yypgzenx (Dada Jee), v_30s70t3a (Nostalgic News)

Example:

"v_meklc281"

phraseReplacementConfigId

string

Optional ID of a phrase replacement configuration to apply

Response

Successfully initiated streaming synthesis

Response containing mediaId and token for retrieving synthesized audio

mediaId

string

Unique identifier for the generated audio media

Example:

"media_abc123xyz"

token

string

JWT token for secure retrieval of the audio

Example:

"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."

Async Text to Speech Retrieve Async Audio

⌘I

Getting Started

Core Concepts

Orator API Endpoints

Scribe API Endpoints

Async Streaming Text to Speech

Authorizations

Body

Response