The GD Chatbot Accuracy System
🎯 Chatbot Accuracy Systems v2.2.0
Core Principle
“Multiple sources of truth, cross-verified and disambiguated, with explicit guardrails against common errors.”
📊 System Overview
The GD Chatbot employs an eight-layer accuracy system to ensure users receive the most accurate, reliable, and comprehensive information about the Grateful Dead. Each layer serves a specific purpose and works together to prevent misinformation, resolve ambiguities, and provide verified facts.
Accuracy Layers
Songs Detected
Shows Indexed
Disambiguated Terms
Context Files
Verified Sources
🏗️ Multi-Layer Architecture
↓
[1] Disambiguation Layer ────→ Resolve ambiguous terms
↓
[2] Content Sanitization ────→ Filter incorrect data
↓
[3] Knowledge Base ──────────→ 8 core topic files (60KB+)
↓
[4] Context Files ───────────→ Specialized detailed data (~55 files)
↓
[5] Pinecone Vector DB ──────→ Semantic search (optional)
↓
[6] Tavily Web Search ───────→ Current information (always on)
↓
[7] Token Optimization ──────→ Intent-based context budgeting
↓
[8] System Prompt Guardrails → Enforce accuracy rules
↓
Claude AI Processing
↓
Verified Response
🔍 The Eight Layers
Disambiguation Layer
Purpose: Resolve ambiguous terms before processing
Coverage: 125+ disambiguated terms across 19 categories
Examples:
- “The Matrix” → San Francisco venue (not the movie)
- “Tiger” → Jerry’s guitar (not the animal)
- “The Archive” → UCSC collection (not Internet Archive)
- “GDP” → Grateful Dead Productions (not economics)
Benefit: Prevents context confusion and ensures correct interpretation
Content Sanitization
Purpose: Filter out incorrect or conflicting information
Special Case: The Bahr Gallery
- All incorrect location references removed from knowledge base
- Exclusive source:
bahr-gallery.md - Location always: Oyster Bay, Long Island, NY
- Triple-layer protection: Sanitization + Injection + System Prompt
Benefit: Eliminates common errors (e.g., Bahr Gallery in San Francisco)
Knowledge Base System
Structure: 8 focused topic files in context/core/
| File | Content | Size |
|---|---|---|
band-and-history.md |
Formation, evolution, members, eras | ~12KB |
books-and-literature.md |
Essential bibliography | ~9KB |
culture-and-community.md |
Deadhead culture, philosophy | ~10KB |
equipment.md |
Instruments, Wall of Sound | ~6KB |
galleries-and-art.md |
Art galleries, museums | ~3KB |
music-and-recordings.md |
Song catalog, discography | ~7KB |
resources-and-media.md |
Online communities, URLs | ~12KB |
terminology.md |
125+ disambiguated terms | ~8KB |
Benefit: Organized, topic-focused knowledge for better AI comprehension
Context Files Integration
Structure: 55+ specialized files across 5 subdirectories
📅 Setlist Database (2,340 Shows)
- 31 CSV files (1965-1995, one per year)
- Complete setlists for every show
- Venue names and locations
- Segue information (e.g., “Scarlet > Fire”)
🎵 Song Database (605 Songs)
- Song titles and composers
- First performance dates
- Performance frequency
- Album appearances
🎸 Equipment Database
- Instrument specifications
- Ownership history
- Technical details
- Usage periods
🎤 Interview Archives
- Direct quotes from band members
- Interview URLs and sources
- Historical context from primary sources
🏛️ UC Santa Cruz Archive
- Official archive documentation
- Collection descriptions
- Research resources
Benefit: Deep-dive accuracy with specialized, verified data sources
Pinecone Vector Database (Optional)
Purpose: Semantic search using AI embeddings
How It Works:
- Converts knowledge into vector embeddings
- Finds semantically similar content
- Returns top-K most relevant results
- Works with natural language queries
Example: Query “Jerry’s favorite guitar” finds relevant content about Tiger and Wolf without exact keyword matches
Benefit: Finds relevant context even with different wording
Tavily Web Search (Always On)
Purpose: Real-time information from trusted sources
Features:
- Trusted Domain Filtering: 50+ pre-approved Grateful Dead websites
- Search Depth: Basic (faster) or Advanced (thorough)
- Max Results: 3-10 results per search
- Always Current: Latest news, events, releases
Trusted Domains Include:
- dead.net (official site)
- archive.org (live recordings)
- deaddisc.com (discography)
- jerrybase.com (Jerry Garcia)
- And 45+ more verified sources
Benefit: Current information with source verification
Token Optimization System (Optional)
Purpose: Intelligent context selection based on query intent
How It Works:
- Intent Detection: Analyzes query to determine topic
- Context Selection: Loads only relevant context files
- Token Budgeting: Enforces token limits (default 500)
- Caching: Stores fragments for faster retrieval
Example: Equipment question loads equipment files, not setlist data
Benefit: Faster responses, lower API costs, focused context
System Prompt Guardrails
Purpose: Explicit rules to prevent common errors
Key Guardrails:
- Never invent setlists or show dates
- Always cite sources for quotes
- Distinguish between studio and live versions
- Clarify composer vs. performer
- Use correct venue names and locations
- Verify equipment specifications
- Acknowledge uncertainty when appropriate
- Prioritize official sources
Benefit: Enforces accuracy standards at the AI processing level
🎯 How It All Works Together
Example Query: “Tell me about Dark Star at Cornell”
Layer 1 (Disambiguation): “Dark Star” = song (not astronomy)
Layer 2 (Sanitization): No conflicting data to filter
Layer 3 (Knowledge Base): Loads song info from music-and-recordings.md
Layer 4 (Context Files): Searches setlists/1977.csv for 5/8/77
Layer 5 (Pinecone): Finds related Cornell ’77 content
Layer 6 (Tavily): Searches for current Cornell ’77 discussions
Layer 7 (Token Optimization): Focuses on setlist + song data
Layer 8 (Guardrails): Ensures accurate setlist reporting
Result: Accurate, comprehensive response with verified setlist, song history, and current context
🎵 Music Streaming Integration (v2.2.0)
New Accuracy Layer: Archive.org Database
Version 2.2.0 adds a ninth layer specifically for music streaming:
- Database: 4 new tables with Archive.org metadata
- Shows: 2,340+ shows with complete information
- Recordings: Individual track data
- Sync: Automatic background updates
- Detection: 600+ songs automatically recognized
Benefit: Song mentions become clickable links with instant access to live recordings
📈 Accuracy Metrics
| Metric | Value | Description |
|---|---|---|
| Source Verification | 100% | All context files from verified sources |
| Show Data Accuracy | 100% | 2,340 shows with verified setlists |
| Song Detection | 600+ | Grateful Dead songs automatically recognized |
| Disambiguation Coverage | 125+ | Terms with explicit context clarification |
| Context Files | 55+ | Specialized knowledge sources |
| Trusted Domains | 50+ | Pre-approved websites for web search |
| Response Time | < 3s | Average response with full context |
🔒 Quality Assurance
Common Errors We Prevent
- ❌ Inventing show dates or setlists
- ❌ Confusing venue locations (e.g., Bahr Gallery)
- ❌ Misattributing songs to wrong composers
- ❌ Mixing up equipment specifications
- ❌ Confusing studio vs. live versions
- ❌ Using unreliable sources
- ❌ Hallucinating band member quotes
- ❌ Incorrect disambiguation of terms
🎯 Best Practices for Users
How to Get the Most Accurate Responses
- Be Specific: “Dark Star at Cornell ’77” vs. “Dark Star”
- Use Dates: “5/8/77” or “May 8, 1977”
- Specify Context: “Jerry’s Tiger guitar” vs. just “Tiger”
- Ask for Sources: “Where can I verify this?”
- Clarify Ambiguity: “The Matrix venue” vs. “The Matrix”
- Request Details: “Full setlist” vs. “What songs”
📊 System Status
Current Configuration
- 8 Core Topic Files Loaded
- 55+ Context Files Available
- 125+ Terms Disambiguated
- 2,340+ Shows Indexed
- 600+ Songs Detected
- 50+ Trusted Domains Configured
- Archive.org Integration Active
- Streaming Services Available (Optional)
🔮 Future Enhancements
Planned improvements to the accuracy system:
- Machine Learning: Train on user feedback to improve responses
- Expanded Sources: Add more verified Grateful Dead resources
- Real-Time Verification: Cross-check facts against multiple sources
- User Corrections: Allow users to report inaccuracies
- Confidence Scores: Display confidence level for each response
- Source Citations: Automatic footnotes for all facts
GD Chatbot v2.2.0 Accuracy Systems |
Eight layers of verification for the most accurate Grateful Dead information |
IT Influentials