Skip to Content
DocumentationArchitecture

Architecture

GuideKit is built as a modular monorepo with clear separation of concerns. This page covers the internal architecture.

Package Structure

guidekit/ packages/ core/ # @guidekit/core — all runtime logic src/ dom/ # DOM Intelligence Engine awareness/ # User Awareness System voice/ # Voice Pipeline (STT + TTS) visual/ # Visual Guidance (spotlight, tooltips) navigation/ # Navigation Controller context/ # Context Manager (LLM prompt assembly) llm/ # LLM Orchestrator (streaming, tool calls) bus/ # Typed EventBus errors/ # Error hierarchy resources/ # Resource lifecycle manager connectivity/ # ConnectionManager i18n/ # Internationalization types/ # Shared TypeScript types vad/ # @guidekit/vad — Silero ONNX VAD model react/ # @guidekit/react — Provider + hooks + widget server/ # @guidekit/server — JWT tokens, middleware vanilla/ # @guidekit/vanilla — IIFE script-tag bundle cli/ # @guidekit/cli — CLI tools apps/ docs/ # Documentation site (Nextra) example-nextjs/ # Reference Next.js integration

Core Subsystems

DOM Intelligence Engine

Builds a compact PageModel (~5KB) using TreeWalker traversal:

  • Section scoring: Viewport-visible sections scored highest
  • data-guidekit-target: Developer-provided stable selectors
  • data-guidekit-ignore: Skip sensitive DOM subtrees
  • Budget: 5000-node limit, yields via requestIdleCallback with adaptive time budget
  • MutationObserver: Debounced (500ms), circuit breaker at 100 mutations/sec

LLM Orchestrator

Streaming multi-turn conversation with tool calling:

  • Provider adapters: Gemini (default), custom adapters via LLMProviderAdapter
  • Tool execution: Multi-turn loop with parallel tool calls
  • Context window: Sliding window, summarize oldest turns at 80% capacity
  • Content filter: Retry with simplified prompt, graceful user message

Voice Pipeline

Half-duplex voice with echo cancellation:

Mic → VAD (Silero) → STT (Deepgram / Web Speech API) → LLM (Gemini) → TTS (ElevenLabs / Web Speech API) → Speaker
  • State machine: IDLE → LISTENING → PROCESSING → SPEAKING → IDLE
  • Barge-in: Detect speech during TTS, abort stream, switch to LISTENING
  • Echo detection: 60% word overlap within 3-second window
  • Degradation: Voice failure → text-only mode (not Web Speech API)

Visual Guidance

Spotlight overlay and tooltip system:

  • Spotlight: Box-shadow cutout on document.body (outside Shadow DOM)
  • Tracking: ResizeObserver + scroll listeners (not rAF polling)
  • Scrollable containers: Detect ancestor overflow, clip to intersection
  • Tours: Sequential guided tour with auto/manual modes
  • Accessibility: aria-live="assertive" announcements for screen readers

User Awareness

Passive signals for proactive behavior:

SignalMethodThrottle
Viewport/scrollscroll (passive)per-frame
Visible sectionsIntersectionObserveron change
Mouse regionelementFromPoint200ms
Dwell detection8s on same section8s
Idle detection60s no interaction60s
Rage clicks3+ clicks in 2sper-event

Security Model

Token-Based Auth

Client → POST /api/guidekit/token → JWT (sessionId, permissions, exp) Client → JWT to server middleware → Server looks up provider keys → Proxy to provider
  • JWT tokens do NOT contain API keys (base64-decodable)
  • Provider keys stay server-side in an in-memory session store
  • Tokens refresh at 80% of TTL, multi-tab coordination via BroadcastChannel
  • Signing secret rotation: accepts array [newSecret, oldSecret]

Privacy

DataLeaves Browser?Storage
Audio chunksYes → STT (ephemeral)Not stored
TranscriptsYes → LLM (ephemeral)Not stored
PageModelYes → LLM (ephemeral)Not stored
Full DOMNoBrowser only
Mouse/scrollNoBrowser only
Form valuesNeverStripped

Protections

  • data-guidekit-ignore: Skip sensitive subtrees
  • onBeforeLLMCall: Developer privacy hook for custom scrubbing
  • Tooltip textContent only (never innerHTML — prevents XSS)
  • Client-side rate limits (cost protection, not security)
  • Password/email/phone inputs: values never included in PageModel

Error Handling

All errors extend GuideKitError with actionable guidance:

class GuideKitError extends Error { code: string; provider?: string; recoverable: boolean; suggestion: string; docsUrl: string; }

Error types: AuthenticationError, ConfigurationError, NetworkError, TimeoutError, RateLimitError, PermissionError, BrowserSupportError, ContentFilterError, ResourceExhaustedError, InitializationError.

EventBus

Typed pub/sub with namespace support:

bus.on('dom:scan-complete', handler); bus.on('llm:*', handler); // Namespace wildcard bus.onAny(handler); // All events const unsub = bus.on('error', handler); unsub(); // Cleanup

Error isolation: one handler throwing does not prevent others from executing.

Resource Management

All allocated resources tracked via ResourceManager:

  • AbortController signal pattern for event listeners
  • Ref-counted singleton (StrictMode safe)
  • Per-resource 2s cleanup timeout on teardown
  • Instance isolation via instanceId

LLM Tools

ToolDescription
highlightSpotlight an element
dismissHighlightRemove spotlight
scrollToSectionSmooth scroll
navigateSPA navigation (same-origin)
startTourSequential guided tour
readPageContentRead visible text
getVisibleSectionsWhat’s in viewport
clickElementProgrammatic click (whitelisted)
executeCustomActionDeveloper-defined actions

Technology Stack

LayerChoice
Buildtsup (esbuild), pnpm + Turborepo
LLMGemini 2.5 Flash (default)
STTDeepgram Nova-3 (WebSocket), Web Speech API
TTSElevenLabs Flash v2.5 (WebSocket), Web Speech API
VADSilero via ONNX Runtime
UIShadow DOM
AuthJWT (HS256 via jose)
TestingVitest + Playwright
Last updated on