Voice
GuideKit supports half-duplex voice interactions powered by VAD (Voice Activity Detection), STT (Speech-to-Text), and TTS (Text-to-Speech).
Setup
Install the VAD package alongside the core SDK:
npm install @guidekit/vadEnable voice mode in the provider:
<GuideKitProvider
tokenEndpoint="/api/guidekit/token"
agent={{ name: 'Guide' }}
options={{ mode: 'voice' }}
>
{children}
</GuideKitProvider>How It Works
- VAD (
@guidekit/vad) — Silero ONNX model detects when the user starts and stops speaking - STT (Deepgram, ElevenLabs, or Web Speech API) — Transcribes audio in real-time
- LLM (Gemini or custom adapter) — Processes the transcribed text and generates a response
- TTS (ElevenLabs or Web Speech API) — Converts the response to speech and plays it back
By default, GuideKit uses the browser-native Web Speech API for STT and TTS, requiring no API keys. For production quality, configure Deepgram/ElevenLabs providers with API keys.
Voice Events
Subscribe to voice lifecycle events via the useGuideKitVoice hook or the EventBus:
const { isListening, isSpeaking, startListening, stopListening, sendText } = useGuideKitVoice();| Event | Description |
|---|---|
voice:state-change | Voice state transition (from/to) |
voice:transcript | Transcription received (text, isFinal, confidence) |
voice:tts-start | TTS playback started |
voice:tts-end | TTS playback finished |
voice:degraded | Voice degraded to text fallback |
Barge-in
When the user speaks while TTS is playing, the SDK detects barge-in and:
- Stops the current TTS playback
- Captures the new speech via STT
- Sends the new input to the LLM
Degradation
If any voice component fails (microphone denied, WebSocket dropped, etc.), the SDK automatically falls back to text-only mode and emits the relevant error code. Users can continue interacting via text without interruption.
Provider Keys
Voice requires additional API keys in your token endpoint:
const token = await createSessionToken({
signingSecret: process.env.GUIDEKIT_SECRET!,
llmApiKey: process.env.LLM_API_KEY!,
sttApiKey: process.env.STT_API_KEY!,
ttsApiKey: process.env.TTS_API_KEY!,
expiresIn: '15m',
});