Skip to main content
Stream text-to-speech audio in real-time using WebSocket connections. Perfect for conversational AI applications that need low-latency audio synthesis.

When to Use WebSocket TTS

Best for: Real-time conversational AI, voice agents, and applications needing continuous TTS streaming with multiple concurrent requests. Visit this tutorial for an implementation.

Key Benefits

  • Low latency: ~300ms to first audio chunk
  • Multiple requests: Handle multiple synthesis requests on single connection
  • Real-time streaming: Audio chunks stream as they’re generated
  • Persistent connection: Reuse connection for entire conversation

Connection

Endpoint

wss://api.upliftai.org/text-to-speech/multi-stream

Authentication

Connect using your API key:
const socket = io('wss://api.upliftai.org/text-to-speech/multi-stream', {
  auth: {
    token: 'sk_api_your_key_here'
  },
  transports: ['websocket']
});

Message Protocol

All messages use a unified format with a type field.

Client → Server Messages

Synthesize Text

{
  "type": "synthesize",
  "requestId": "unique_request_id",
  "text": "سلام، آپ کیسے ہیں؟",
  "voiceId": "v_meklc281",
  "outputFormat": "MP3_22050_32"
}
Parameters:
  • requestId: Unique ID for tracking this request
  • text: Text to synthesize (max 10,000 characters)
  • voiceId: Voice to use (e.g., “v_meklc281” for Urdu female)
  • outputFormat: Audio format (optional, defaults to PCM_22050_16)

Cancel Request

{
  "type": "cancel",
  "requestId": "unique_request_id"
}

Server → Client Messages

All server messages come through the message event:

Connection Ready

{
  "type": "ready",
  "sessionId": "session_abc123"
}

Audio Start

{
  "type": "audio_start",
  "requestId": "unique_request_id",
  "timestamp": 1234567890
}

Audio Chunk

{
  "type": "audio",
  "requestId": "unique_request_id",
  "audio": "base64_encoded_audio_data",
  "sequence": 0
}

Audio End

{
  "type": "audio_end",
  "requestId": "unique_request_id",
  "timestamp": 1234567890
}

Error

{
  "type": "error",
  "requestId": "unique_request_id",
  "code": "synthesis_failed",
  "message": "Voice not found"
}

Simple Example

import { io } from 'socket.io-client';

// Connect to WebSocket
const socket = io('wss://api.upliftai.org/text-to-speech/multi-stream', {
  auth: { token: 'sk_api_your_key' },
  transports: ['websocket']
});

// Handle messages
socket.on('message', (data) => {
  switch(data.type) {
    case 'ready':
      console.log('Connected!');
      // Start synthesis
      socket.emit('synthesize', {
        type: 'synthesize',
        requestId: 'req_001',
        text: 'سلام، یہ ایک ٹیسٹ ہے۔',
        voiceId: 'v_meklc281',
        outputFormat: 'MP3_22050_32'
      });
      break;
      
    case 'audio':
      // Decode and play audio chunk
      const audioData = Buffer.from(data.audio, 'base64');
      // Play audioData...
      break;
      
    case 'audio_end':
      console.log('Audio complete!');
      break;
      
    case 'error':
      console.error('Error:', data.message);
      break;
  }
});

Output Formats

FormatDescriptionUse Case
PCM_22050_16Raw PCM, 22.05kHz, 16-bitDirect audio processing
MP3_22050_32MP3, 22.05kHz, 32kbpsSmall file size, web
MP3_22050_128MP3, 22.05kHz, 128kbpsHigh quality streaming
WAV_22050_32WAV, 22.05kHz, 32-bitLossless audio
ULAW_8000_8μ-law, 8kHz, 8-bitTelephony systems

Available Voices

Use the same voice IDs as REST API:
  • v_meklc281 - Urdu female
  • v_8eelc901 - Info/Education
  • v_30s70t3a - Nostalgic News
  • v_yypgzenx - Dada Jee (storytelling)

Error Codes

CodeDescriptionAction
auth_failedInvalid API keyCheck your API key
synthesis_failedTTS service errorRetry with backoff
duplicate_requestRequest ID already usedUse unique IDs
rate_limit_exceededToo many requestsSlow down requests
text_too_longText > 10,000 charsSplit into chunks

Rate Limits

  • Synthesis requests: 60 per minute per connection
  • Cancel requests: 100 per minute per connection
  • Max text length: 10,000 characters per request
  • Daily limit: Based on your plan

Best Practices

Generate unique IDs (like UUIDs) for each synthesis request to track audio chunks properly.
Keep one WebSocket connection open and reuse it for multiple synthesis requests.
Collect audio chunks before playback for smooth streaming experience.
Implement exponential backoff for reconnection attempts on connection loss.

Testing with wscat

Quick test using command line:
# Install wscat
npm install -g wscat

# Connect
wscat -c wss://api.upliftai.org/text-to-speech/multi-stream \
  -H "Authorization: Bearer sk_api_your_key"

# Send synthesize message
{"type":"synthesize","requestId":"test-1","text":"Hello world","voiceId":"v_meklc281"}

Next Steps

I