Realtime Assistants Overview

The Realtime Assistants API is currently in beta. Features and specifications may change as we continue to improve the platform.

What are Realtime Assistants?

Realtime Assistants are AI-powered voice agents that can engage in natural, real-time conversations with users. The key benefit is easily creating voice assistants with UpliftAI models, agent hosting, and WebRTC delivery in frontend, mobile, or web apps. They provide:

End-to-end latency of ~1 second for natural conversations (depends on model choices etc.)
Natural conversation flow with interruption handling
Multi-modal capabilities supporting voice, text, and custom tools
Dynamic configuration for real-time behavior updates
- update tools available for the agent mid session
- completely update the agent prompt mid session
Scalable infrastructure supporting thousands of concurrent sessions

Key Features

Voice-First Design

Natural speech recognition and synthesis with support for multiple languages and voices

Custom Tools

Extend your assistant with custom functions that can access external APIs and services

Real-time Updates

Update instructions and tools on the fly without restarting sessions

Easy Integration

Simple SDKs for React, JavaScript, and mobile platforms

Use Cases

Realtime Assistants are perfect for:

Customer Support - 24/7 voice-enabled support agents
Virtual Receptionists - Automated call handling and routing
Educational Tutors - Interactive learning experiences
Healthcare Assistants - Patient intake and appointment scheduling
Sales Agents - Product demonstrations and lead qualification
Personal Assistants - Task management and information retrieval

Architecture Overview

Provider Support

Speech-to-Text (STT)

Groq Whisper (recommended for Pakistani languages, whisper-large-v3)
Deepgram (nova-3 recommended for English)
OpenAI: gpt-4o-transcribe or gpt-4o-mini-transcribe
UpliftAI The best Pakistani STT coming soon!

Text-to-Speech (TTS)

UpliftAI Orator (ultra-fast, natural voices - supports Urdu, Sindhi, Balochi)
- See available voices
OpenAI (standard voices), use model gpt-4o-mini-tts

Language Models (LLM)

Groq (recommended)
- openai/gpt-oss-120b (best quality)
- openai/gpt-oss-20b (faster responses)
OpenAI GPT-4 (alternative)

Getting Started

Create an Assistant

Use the API or platform to create your first assistant configuration

curl -X POST https://api.upliftai.org/v1/realtime-assistants \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Dream Psychologist",
    "description": "Understands the mysterious world of your dreams",
    "config": {
      "agent": {
        "instructions": "You are a dream interpreter who helps people understand their dreams.",
        "initialGreeting": true,
        "greetingInstructions": "Salam! How may I assist you today?"
      },
      "stt": {
        "default": {
          "provider": "groq",
          "model": "whisper-large-v3",
          "language": "en"
        }
      },
      "tts": {
        "default": {
          "provider": "upliftai",
          "voiceId": "v_meklc281",
          "outputFormat": "MP3_22050_32"
        }
      },
      "llm": {
        "default": {
          "provider": "groq",
          "model": "openai/gpt-oss-120b"
        }
      },
      "session": {
        "ttl": 1800
      }
    }
  }'

Create a Session

Generate a session token for your client application

curl -X POST https://api.upliftai.org/v1/realtime-assistants/{assistantId}/createSession \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "participantName": "User"
  }'

Connect Your Client

Use our SDKs to connect to the session

import { UpliftAIRoom } from '@upliftai/assistants-react';

<UpliftAIRoom
  token={sessionToken}
  serverUrl={wsUrl}
  connect={true}
  audio={true}
>
  <YourAssistantUI />
</UpliftAIRoom>

Next Steps

Read the Concepts

Understand how Realtime Assistants work under the hood

Try the Tutorial

Build your first voice assistant in 10 minutes

Explore the API

Deep dive into the API endpoints

View Examples

Check out our example implementations

Getting Started

Building Assistants

API Reference (Beta)

SDKs

What are Realtime Assistants?

Key Features

Voice-First Design

Custom Tools

Real-time Updates

Easy Integration

Use Cases

Architecture Overview

Provider Support

Speech-to-Text (STT)

Text-to-Speech (TTS)

Language Models (LLM)

Getting Started

Next Steps

Read the Concepts

Try the Tutorial

Explore the API

View Examples

Getting Started

Building Assistants

API Reference (Beta)

SDKs

​What are Realtime Assistants?

​Key Features

Voice-First Design

Custom Tools

Real-time Updates

Easy Integration

​Use Cases

​Architecture Overview

​Provider Support

​Speech-to-Text (STT)

​Text-to-Speech (TTS)

​Language Models (LLM)

​Getting Started

​Next Steps

Read the Concepts

Try the Tutorial

Explore the API

View Examples

What are Realtime Assistants?

Key Features

Use Cases

Architecture Overview

Provider Support

Speech-to-Text (STT)

Text-to-Speech (TTS)

Language Models (LLM)

Getting Started

Next Steps