Stream text-to-speech audio in real-time using WebSocket connections. Perfect for conversational AI applications that need low-latency audio synthesis.

When to Use WebSocket TTS

Best for: Real-time conversational AI, voice agents, and applications needing continuous TTS streaming with multiple concurrent requests. Visit this tutorial for an implementation.

Key Benefits

  • Low latency: ~300ms to first audio chunk
  • Multiple requests: Handle multiple synthesis requests on single connection
  • Real-time streaming: Audio chunks stream as they’re generated
  • Persistent connection: Reuse connection for entire conversation

Connection

Endpoint

wss://api.upliftai.org/text-to-speech/multi-stream

Authentication

Connect using your API key:
const socket = io('wss://api.upliftai.org/text-to-speech/multi-stream', {
  auth: {
    token: 'sk_api_your_key_here'
  },
  transports: ['websocket']
});

Message Protocol

All messages use a unified format with a type field.

Client → Server Messages

Synthesize Text

{
  "type": "synthesize",
  "requestId": "unique_request_id",
  "text": "سلام، آپ کیسے ہیں؟",
  "voiceId": "v_meklc281",
  "outputFormat": "MP3_22050_32"
}
Parameters:
  • requestId: Unique ID for tracking this request
  • text: Text to synthesize (max 10,000 characters)
  • voiceId: Voice to use (e.g., “v_meklc281” for Urdu female)
  • outputFormat: Audio format (optional, defaults to PCM_22050_16)

Cancel Request

{
  "type": "cancel",
  "requestId": "unique_request_id"
}

Server → Client Messages

All server messages come through the message event:

Connection Ready

{
  "type": "ready",
  "sessionId": "session_abc123"
}

Audio Start

{
  "type": "audio_start",
  "requestId": "unique_request_id",
  "timestamp": 1234567890
}

Audio Chunk

{
  "type": "audio",
  "requestId": "unique_request_id",
  "audio": "base64_encoded_audio_data",
  "sequence": 0
}

Audio End

{
  "type": "audio_end",
  "requestId": "unique_request_id",
  "timestamp": 1234567890
}

Error

{
  "type": "error",
  "requestId": "unique_request_id",
  "code": "synthesis_failed",
  "message": "Voice not found"
}

Simple Example

import { io } from 'socket.io-client';

// Connect to WebSocket
const socket = io('wss://api.upliftai.org/text-to-speech/multi-stream', {
  auth: { token: 'sk_api_your_key' },
  transports: ['websocket']
});

// Handle messages
socket.on('message', (data) => {
  switch(data.type) {
    case 'ready':
      console.log('Connected!');
      // Start synthesis
      socket.emit('synthesize', {
        type: 'synthesize',
        requestId: 'req_001',
        text: 'سلام، یہ ایک ٹیسٹ ہے۔',
        voiceId: 'v_meklc281',
        outputFormat: 'MP3_22050_32'
      });
      break;
      
    case 'audio':
      // Decode and play audio chunk
      const audioData = Buffer.from(data.audio, 'base64');
      // Play audioData...
      break;
      
    case 'audio_end':
      console.log('Audio complete!');
      break;
      
    case 'error':
      console.error('Error:', data.message);
      break;
  }
});

Output Formats

FormatDescriptionUse Case
PCM_22050_16Raw PCM, 22.05kHz, 16-bitDirect audio processing
MP3_22050_32MP3, 22.05kHz, 32kbpsSmall file size, web
MP3_22050_128MP3, 22.05kHz, 128kbpsHigh quality streaming
WAV_22050_32WAV, 22.05kHz, 32-bitLossless audio
ULAW_8000_8μ-law, 8kHz, 8-bitTelephony systems

Available Voices

Use the same voice IDs as REST API:
  • v_meklc281 - Urdu female
  • v_8eelc901 - Info/Education
  • v_30s70t3a - Nostalgic News
  • v_yypgzenx - Dada Jee (storytelling)

Error Codes

CodeDescriptionAction
auth_failedInvalid API keyCheck your API key
synthesis_failedTTS service errorRetry with backoff
duplicate_requestRequest ID already usedUse unique IDs
rate_limit_exceededToo many requestsSlow down requests
text_too_longText > 10,000 charsSplit into chunks

Rate Limits

  • Synthesis requests: 60 per minute per connection
  • Cancel requests: 100 per minute per connection
  • Max text length: 10,000 characters per request
  • Daily limit: Based on your plan

Best Practices

Testing with wscat

Quick test using command line:
# Install wscat
npm install -g wscat

# Connect
wscat -c wss://api.upliftai.org/text-to-speech/multi-stream \
  -H "Authorization: Bearer sk_api_your_key"

# Send synthesize message
{"type":"synthesize","requestId":"test-1","text":"Hello world","voiceId":"v_meklc281"}

Next Steps