Async TTS with retrieval

import requests
import json

# Step 1: Initiate async TTS
url = "https://api.upliftai.org/v1/synthesis/text-to-speech-async"

payload = json.dumps({
  "voiceId": "v_meklc281",
  "text": "سلام، یہ پاکستان کی تاریخ کے بارے میں ہے۔",
  "outputFormat": "MP3_22050_128"
})
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer YOUR_API_KEY'
}

response = requests.post(url, headers=headers, data=payload)
result = response.json()

# Step 2: Retrieve audio when ready
media_id = result['mediaId']
token = result['token']

audio_url = f"https://api.upliftai.org/v1/synthesis/stream-audio/{media_id}?token={token}"

# Get the audio
audio_response = requests.get(audio_url)

# Save to file
with open('output.mp3', 'wb') as f:
    f.write(audio_response.content)

{
  "mediaId": "media_abc123xyz",
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}

Orator API Endpoints

Async Text to Speech

Asynchronous text-to-speech synthesis for bot and server-side workflows

This endpoint initiates text-to-speech synthesis and immediately returns a mediaId and token. The audio is generated asynchronously and can be retrieved using the returned credentials.

When to use this endpoint:

Bot integrations (WhatsApp, Telegram, etc.) - Avoid audio passing through your system
Webhook workflows - When you need to process audio generation separately
Batch processing - When converting multiple texts without blocking
Direct client delivery - Let clients fetch audio directly using the secure token

For best results with Urdu, use Urdu script. For English words within Urdu text, use ASCII characters. Example: “یہ ایک exerted force ہے”

The generated audio URL can be shared directly with end users or services without proxying through your server.

POST

synthesis

text-to-speech-async

Async TTS with retrieval

import requests
import json

# Step 1: Initiate async TTS
url = "https://api.upliftai.org/v1/synthesis/text-to-speech-async"

payload = json.dumps({
  "voiceId": "v_meklc281",
  "text": "سلام، یہ پاکستان کی تاریخ کے بارے میں ہے۔",
  "outputFormat": "MP3_22050_128"
})
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer YOUR_API_KEY'
}

response = requests.post(url, headers=headers, data=payload)
result = response.json()

# Step 2: Retrieve audio when ready
media_id = result['mediaId']
token = result['token']

audio_url = f"https://api.upliftai.org/v1/synthesis/stream-audio/{media_id}?token={token}"

# Get the audio
audio_response = requests.get(audio_url)

# Save to file
with open('output.mp3', 'wb') as f:
    f.write(audio_response.content)

{
  "mediaId": "media_abc123xyz",
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}

Authorizations

Authorization

string

header

required

API key with format "Bearer sk_api_..."

Body

application/json

Request for asynchronous text-to-speech synthesis

text

string

required

The text to synthesize

Maximum string length: 2500

Example:

"سلام، آپ اِس وقت اوریٹر کی آواز سن رہے ہیں۔"

outputFormat

enum<string>

required

Format of the output audio. Wav files are usually 10x larger, we recommend using MP3 or OGG for best compression results while maintaining quality.

Available options:

PCM_22050_16,

WAV_22050_16,

WAV_22050_32,

MP3_22050_32,

MP3_22050_64,

MP3_22050_128,

OGG_22050_16,

ULAW_8000_8

voiceId

string

Identifier for the voice to use. Named voices: v_meklc281 (Urdu female), v_8eelc901 (Info/Edu), v_kwmp7zxt (Gen Z), v_yypgzenx (Dada Jee), v_30s70t3a (Nostalgic News)

Example:

"v_meklc281"

phraseReplacementConfigId

string

Optional ID of a phrase replacement configuration to apply

Response

Successfully initiated audio synthesis

Response containing mediaId and token for retrieving synthesized audio

mediaId

string

Unique identifier for the generated audio media

Example:

"media_abc123xyz"

token

string

JWT token for secure retrieval of the audio

Example:

"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."

Stream Text to Speech Async Streaming Text to Speech

Getting Started

Core Concepts

Orator API Endpoints

Scribe API Endpoints

Async Text to Speech

Authorizations

Body

Response