Skip to content

MVP Roadmap

Goal: Basic companion that talks, remembers, and initiates.

Status: 🚧 IN PROGRESS - Core conversation and memory features complete, voice and avatar integration pending.

Frontend MVP

Conversation Interface ✅ Complete

  • Text Conversation UI: Flutter conversation interface with glassmorphic design
  • Real-time Updates: WebSocket streaming for live token-by-token responses
  • Typing Indicators: Streaming response display with real-time updates
  • Message History: Scrollable conversation history with encrypted local cache (Drift)
  • User Input: Text input with send button and enter key support
  • Status Display: Connection status with health monitoring and retry logic
  • Message Actions: Hover toolbar (Copy, Remember, Regenerate, Feedback)

Voice Interaction 🚧 Planned

  • Speech-to-Text: Local Whisper.cpp integration for voice input
  • Text-to-Speech: Local Coqui/Piper for voice output
  • Voice Controls: Push-to-talk and voice activation
  • Audio Processing: Noise reduction and audio quality optimization
  • Voice Settings: Voice selection, speed, and volume controls
  • Multimodal Input: Seamless switching between text and voice

Basic Avatar ✅ Complete

  • Architecture Defined: InAppWebView + Three.js + Ready Player Me pattern
  • InAppLocalhostServer: Built-in HTTP server for ES6 modules
  • Animation System: GLB animation files with AnimationMixer and variation system
  • Active Integration: JavaScript bridge connects avatar to Flutter
  • Idle Animation: Base idle.glb with 7 natural variations (3-10s intervals)
  • Natural Blinking: ARKit morph targets with realistic blink curve (180ms, asymmetric)
  • Emotion Expressions: 12 canonical emotions via ARKit blend shapes with smooth transitions
  • Emotion Integration: Auto-syncs with EmotionProvider (2s polling)
  • Eye Gaze: Natural downward gaze for warm eye contact
  • Speaking Animation: Switch to talking.glb during AICO responses, driven by TTS speaking state
  • Lip-sync: Real-time Web Audio API frequency-based lip-sync with ARKit visemes (Phase 1); Rhubarb-based phoneme timing planned for Phase 2
  • Avatar Controls: Mute/unmute, avatar on/off toggle (planned)

User Experience

  • Offline Mode: Cache-first loading with graceful degradation
  • Responsive Design: Works on macOS, iOS, Android, Linux, Windows
  • Connection Management: Automatic reconnection with exponential backoff
  • Error Handling: User-friendly error messages and retry logic
  • Onboarding: Simple welcome flow and personality setup
  • Settings: Basic preferences (name, avatar, personality sliders)

Backend MVP

LLM Integration

  • Model Configuration: Qwen3 Abliterated 8B with Ollama auto-management
  • Character Personalities: Modelfile system with Eve as default character
  • Prompt Engineering: System prompts with memory context integration
  • Response Generation: Streaming completions via WebSocket
  • Fallback Handling: Graceful degradation with health monitoring
  • Auto-Management: Automatic Ollama installation and model pulling

Memory System

  • Three-Tier Architecture: Working (LMDB) + Semantic (ChromaDB) + Knowledge Graph (DuckDB)
  • Conversation Memory: 24-hour working memory with sub-millisecond access
  • User Facts: Knowledge graph with 204 nodes, 27 edges, 552 properties
  • Context Retrieval: Hybrid Search V3 (semantic + BM25 + IDF + RRF)
  • Memory Consolidation: AMS with background consolidation tasks
  • Semantic Search: ChromaDB with 768-dim multilingual embeddings
  • Memory Album: User-curated conversation and message-level memories
  • Entity Extraction: GLiNER zero-shot NER + LLM relationship extraction
  • Entity Resolution: 3-step deduplication with semantic blocking

Personality Engine

  • Character System: Ollama Modelfiles for custom personalities
  • Eve Character: Warm, curious, contemplative default personality
  • Expression Mapping: System prompts define communication style
  • Consistency: Model parameters ensure character consistency
  • Thinking Process: Ollama 0.12+ native thinking API
  • Trait System: Formal Big Five/HEXACO implementation (future)
  • Personality Configuration: User-adjustable personality sliders (future)

Emotion Recognition

  • Text Sentiment: BERT Multilingual sentiment classification
  • Emotion Analysis: RoBERTa 6-emotion classification
  • Intent Classification: XLM-RoBERTa multilingual intent understanding
  • Memory Album Integration: Automatic emotional tone detection
  • Voice Analysis: Basic emotion detection from voice (planned)
  • Behavioral Patterns: User mood learning over time (planned)
  • Emotion History: Track emotional patterns (planned)

Emotion Simulation

  • Canonical Emotion Labels: 12 emotions (neutral, calm, curious, playful, warm_concern, protective, focused, encouraging, reassuring, apologetic, tired, reflective)
  • Avatar Expression: Emotion-driven facial expressions via ARKit blend shapes
  • Smooth Transitions: 5% interpolation per frame for natural emotion changes
  • Emotional Memory: Memory Album tracks emotional tone
  • Basic Appraisal: Simplified Component Process Model for emotion generation (planned)
  • Voice Coordination: Sync emotions across avatar and voice (planned)
  • Empathetic Responses: Generate emotionally appropriate reactions (planned)

Basic Agency

  • Task Scheduler: Cron-based scheduler with resource awareness
  • Background Tasks: Log cleanup, key rotation, health checks, database vacuum
  • AMS Tasks: Memory consolidation, feedback classification, Thompson sampling
  • KG Tasks: Graph consolidation, entity resolution, relationship inference
  • Conversation Continuity: Memory system enables referencing past conversations
  • Initiative System: Proactive conversation starters (planned)
  • Goal Generation: Self-formulated objectives (planned)
  • Suggestion Engine: Context-based suggestions (planned)
  • Proactive Timing: Intelligent initiative timing (planned)

Core Services

  • Conversation API: REST + WebSocket endpoints with streaming
  • Memory API: Store and retrieve memories (Memory Album)
  • Knowledge Graph API: GQL/Cypher queries via CLI
  • Modelservice API: ZeroMQ-based LLM, embeddings, NER, sentiment
  • Health API: System health and status monitoring

AI Feature Integration

Module Coordination

  • LLM-Memory Integration: Context assembly from three-tier memory system
  • Personality-LLM Integration: Modelfile system defines character behavior
  • Emotion Recognition Integration: Sentiment analysis for Memory Album
  • Agency-Memory Integration: Task scheduler uses memory for consolidation
  • Knowledge Graph Integration: Entity extraction and relationship modeling
  • Emotion-Avatar Integration: Real-time facial expressions sync with emotion state
  • Voice-Avatar Sync: Basic voice-activated talking animation and Web Audio API lip-sync implemented; higher-accuracy phoneme timing via Rhubarb planned

Validation Criteria

Core Functionality

  • Remembers user preferences across sessions (three-tier memory)
  • Shows consistent personality responses (Eve character via Modelfile)
  • Recognizes user emotions (sentiment analysis)
  • Works completely offline (local-first architecture)
  • Responds within 3-5 seconds (streaming with <500ms first token)
  • Initiates conversations without prompting (planned)
  • Expresses emotions through avatar and voice (planned)

User Experience

  • Smooth conversation interface with real-time streaming
  • Personality feels consistent and authentic (Eve character)
  • Encrypted local storage with fast load times (<200ms)
  • Offline-first with graceful degradation
  • Cross-platform support (macOS, iOS, Android, Linux, Windows)
  • Voice interaction works seamlessly with text (planned)
  • Avatar animations sync with conversation (planned)
  • Proactive behavior feels natural (planned)
  • Easy setup and configuration (partially complete)

Emotional Intelligence

  • Detects user mood from text (BERT, RoBERTa)
  • Remembers emotional context (Memory Album with tone)
  • Shows emotional expressions via avatar (12 canonical emotions with ARKit blend shapes)
  • Natural micro-expressions (realistic blinking, eye gaze)
  • Detects mood from voice (planned)
  • Responds empathetically to emotions (planned)
  • Adapts communication style based on mood (planned)

Technical Requirements

Note: Core infrastructure provided by Foundation I (complete)

Implemented: - LLM: Qwen3 Abliterated 8B via Ollama with Modelfile system - Memory: Three-tier (LMDB + ChromaDB + DuckDB) with Hybrid Search V3 - Knowledge Graph: NetworkX + DuckDB with GQL/Cypher queries - Entity Extraction: GLiNER Medium v2.1 + sentence-transformers - Sentiment: BERT Multilingual + RoBERTa emotion + XLM-RoBERTa intent - Personality: Modelfile-based character system (Eve)

Implemented: - Avatar: InAppWebView + Three.js + Ready Player Me with emotion expressions - Facial Expressions: 12 canonical emotions via ARKit blend shapes (52 morph targets) - Natural Behaviors: Realistic blinking (180ms asymmetric), eye gaze, idle variations - Lip-sync (Phase 1): Web Audio API frequency-based viseme detection and ARKit blend shape mapping for real-time avatar lip-sync; Rhubarb Lip Sync integration planned for Phase 2

Planned: - Voice: Whisper.cpp (STT) + Coqui/Piper (TTS) integration - Emotion Simulation: AppraisalCloudPCT implementation - Formal Personality: Big Five/HEXACO trait system

Success Definition

User can have a 10-minute conversation where AICO: 1. Memory: Remembers something from earlier (three-tier memory system) 2. Personality: Shows consistent personality traits (Eve character via Modelfile) 3. Emotion Recognition: Detects user mood from text (BERT, RoBERTa) 4. Contextual Intelligence: Uses memory context for relevant responses 5. Knowledge Graph: Builds relationship understanding from conversations 6. Emotion Expression: Displays 12 canonical emotions via avatar facial expressions 7. Agency: Asks follow-up questions unprompted (planned) 8. Voice Interaction: Handles text and voice input/output (planned)

Current Status: Items 1-6 complete and operational. Items 7-8 pending voice integration and proactive behavior.

Status: 🚧 MVP IN PROGRESS - Core conversation and memory complete, voice and avatar pending.

Next: Complete voice and avatar integration, then proceed to Foundation II for advanced infrastructure.