The Realtime Assistants API is currently in beta. We’re actively improving performance and adding features based on user feedback.
How Realtime Assistants Work
Realtime Assistants combine several advanced technologies to create seamless voice interactions:- WebRTC Connection - Low-latency audio/video streaming
- Speech Processing - Real-time STT and TTS
- AI Agent - Intelligent conversation management
- Tool Execution - Dynamic function calling via RPC
Key Components
Sessions
A session represents a single conversation between a user and an assistant. Each session:- Has a unique room identifier
- Supports multiple participants (user + agent)
- Maintains conversation context
- Can be configured with custom settings
- Has a configurable TTL (time-to-live)
Agents
The AI agent is the brain of your assistant. It:- Processes user speech in real-time
- Manages conversation flow
- Handles interruptions gracefully
- Executes tools when needed
- Maintains context throughout the session
Configuration
Each assistant can be configured with:- Agent Settings
- STT Settings
- TTS Settings
- LLM Settings
Tools and Functions
Tools extend your assistant’s capabilities. These tools can be executed through RPC communication from the agent to your user device, or external API calls directly from the agent (coming soon.). Your frontend can then execute: Custom Tools:- Access external APIs
- Perform calculations
- Query databases
- Execute custom business logic
- Connect to MCP servers
- Bridge to enterprise systems and common places like Shopify.
Tool Definition
Tool Execution Flow
Dynamic Configuration
Assistants can be updated in real-time without disrupting active sessions. The follow show using React example with@upliftai/assistants-react
package:
Update Instructions
Change the assistant’s behavior on the flyManage Tools
Add or remove tools dynamically:Connection Lifecycle
1
Session Creation
Client requests a session token from your backend
2
WebRTC Negotiation
Client connects to LiveKit server using the token
3
Agent Join
AI agent joins the room and initializes
4
Conversation
Real-time audio streaming and processing
5
Tool Execution
Agent requests tool execution via RPC when needed
6
Session End
Client disconnects or session expires
Public vs Private Assistants
For agents that are created andpublic
flag is enabled, then a session can be created for them without an API key. This is recommended for public facing agents like on websites etc if user authentication isn’t required.
You can use the upliftai widget for public agents. See npm for more options.
Performance Optimization
Latency Reduction
- Geographic Distribution: Agents AND MODELS deployed close to Pakistan
- Provider Selection: Choose fastest providers for your use case
- Connection Pooling: Pre-warmed connections to providers
Security Considerations
Always validate and sanitize tool inputs and outputs. Never expose sensitive API keys or credentials in client-side code. Tokens given by UpliftAI using createSession are safe to use.
Best Practices
- Authentication: Always use backend session creation for production
- Tool Validation: Implement strict input validation in tool handlers. ALWAYS make sure inputs are in the format you expect.