Skip to Content

Voice

GuideKit supports half-duplex voice interactions powered by VAD (Voice Activity Detection), STT (Speech-to-Text), and TTS (Text-to-Speech).

Setup

Install the VAD package alongside the core SDK:

npm install @guidekit/vad

Enable voice mode in the provider:

<GuideKitProvider tokenEndpoint="/api/guidekit/token" agent={{ name: 'Guide' }} options={{ mode: 'voice' }} > {children} </GuideKitProvider>

How It Works

  1. VAD (@guidekit/vad) — Silero ONNX model detects when the user starts and stops speaking
  2. STT (Deepgram, ElevenLabs, or Web Speech API) — Transcribes audio in real-time
  3. LLM (Gemini or custom adapter) — Processes the transcribed text and generates a response
  4. TTS (ElevenLabs or Web Speech API) — Converts the response to speech and plays it back

By default, GuideKit uses the browser-native Web Speech API for STT and TTS, requiring no API keys. For production quality, configure Deepgram/ElevenLabs providers with API keys.

Voice Events

Subscribe to voice lifecycle events via the useGuideKitVoice hook or the EventBus:

const { isListening, isSpeaking, startListening, stopListening, sendText } = useGuideKitVoice();
EventDescription
voice:state-changeVoice state transition (from/to)
voice:transcriptTranscription received (text, isFinal, confidence)
voice:tts-startTTS playback started
voice:tts-endTTS playback finished
voice:degradedVoice degraded to text fallback

Barge-in

When the user speaks while TTS is playing, the SDK detects barge-in and:

  1. Stops the current TTS playback
  2. Captures the new speech via STT
  3. Sends the new input to the LLM

Degradation

If any voice component fails (microphone denied, WebSocket dropped, etc.), the SDK automatically falls back to text-only mode and emits the relevant error code. Users can continue interacting via text without interruption.

Provider Keys

Voice requires additional API keys in your token endpoint:

const token = await createSessionToken({ signingSecret: process.env.GUIDEKIT_SECRET!, llmApiKey: process.env.LLM_API_KEY!, sttApiKey: process.env.STT_API_KEY!, ttsApiKey: process.env.TTS_API_KEY!, expiresIn: '15m', });
Last updated on