The Realtime Assistants API is currently in beta. Features and specifications may change as we continue to improve the platform.
What are Realtime Assistants?
Realtime Assistants are AI-powered voice agents that can engage in natural, real-time conversations with users. The key benefit is easily creating voice assistants with UpliftAI models, agent hosting, and WebRTC delivery in frontend, mobile, or web apps. They provide:- End-to-end latency of ~1 second for natural conversations (depends on model choices etc.)
- Natural conversation flow with interruption handling
- Multi-modal capabilities supporting voice, text, and custom tools
- Dynamic configuration for real-time behavior updates
- update tools available for the agent mid session
- completely update the agent prompt mid session
- Scalable infrastructure supporting thousands of concurrent sessions
Key Features
Voice-First Design
Natural speech recognition and synthesis with support for multiple languages and voices
Custom Tools
Extend your assistant with custom functions that can access external APIs and services
Real-time Updates
Update instructions and tools on the fly without restarting sessions
Easy Integration
Simple SDKs for React, JavaScript, and mobile platforms
Use Cases
Realtime Assistants are perfect for:- Customer Support - 24/7 voice-enabled support agents
- Virtual Receptionists - Automated call handling and routing
- Educational Tutors - Interactive learning experiences
- Healthcare Assistants - Patient intake and appointment scheduling
- Sales Agents - Product demonstrations and lead qualification
- Personal Assistants - Task management and information retrieval
Architecture Overview
Provider Support
Speech-to-Text (STT)
- Groq Whisper (recommended for Pakistani languages, whisper-large-v3)
- Deepgram (
nova-3
recommended for English) - OpenAI:
gpt-4o-transcribe
orgpt-4o-mini-transcribe
- UpliftAI The best Pakistani STT coming soon!
Text-to-Speech (TTS)
- UpliftAI Orator (ultra-fast, natural voices - supports Urdu, Sindhi, Balochi)
- See available voices
- OpenAI (standard voices), use model
gpt-4o-mini-tts
Language Models (LLM)
- Groq (recommended)
openai/gpt-oss-120b
(best quality)openai/gpt-oss-20b
(faster responses)
- OpenAI GPT-4 (alternative)
Getting Started
1
Create an Assistant
Use the API or platform to create your first assistant configuration
2
Create a Session
Generate a session token for your client application
3
Connect Your Client
Use our SDKs to connect to the session