MVP Roadmap¶
Goal: Basic companion that talks, remembers, and initiates.
Status: 🚧 IN PROGRESS - Core conversation and memory features complete, voice and avatar integration pending.
Frontend MVP¶
Conversation Interface ✅ Complete¶
- Text Conversation UI: Flutter conversation interface with glassmorphic design
- Real-time Updates: WebSocket streaming for live token-by-token responses
- Typing Indicators: Streaming response display with real-time updates
- Message History: Scrollable conversation history with encrypted local cache (Drift)
- User Input: Text input with send button and enter key support
- Status Display: Connection status with health monitoring and retry logic
- Message Actions: Hover toolbar (Copy, Remember, Regenerate, Feedback)
Voice Interaction 🚧 Planned¶
- Speech-to-Text: Local Whisper.cpp integration for voice input
- Text-to-Speech: Local Coqui/Piper for voice output
- Voice Controls: Push-to-talk and voice activation
- Audio Processing: Noise reduction and audio quality optimization
- Voice Settings: Voice selection, speed, and volume controls
- Multimodal Input: Seamless switching between text and voice
Basic Avatar ✅ Complete¶
- Architecture Defined: InAppWebView + Three.js + Ready Player Me pattern
- InAppLocalhostServer: Built-in HTTP server for ES6 modules
- Animation System: GLB animation files with AnimationMixer and variation system
- Active Integration: JavaScript bridge connects avatar to Flutter
- Idle Animation: Base idle.glb with 7 natural variations (3-10s intervals)
- Natural Blinking: ARKit morph targets with realistic blink curve (180ms, asymmetric)
- Emotion Expressions: 12 canonical emotions via ARKit blend shapes with smooth transitions
- Emotion Integration: Auto-syncs with EmotionProvider (2s polling)
- Eye Gaze: Natural downward gaze for warm eye contact
- Speaking Animation: Switch to talking.glb during AICO responses, driven by TTS speaking state
- Lip-sync: Real-time Web Audio API frequency-based lip-sync with ARKit visemes (Phase 1); Rhubarb-based phoneme timing planned for Phase 2
- Avatar Controls: Mute/unmute, avatar on/off toggle (planned)
User Experience¶
- Offline Mode: Cache-first loading with graceful degradation
- Responsive Design: Works on macOS, iOS, Android, Linux, Windows
- Connection Management: Automatic reconnection with exponential backoff
- Error Handling: User-friendly error messages and retry logic
- Onboarding: Simple welcome flow and personality setup
- Settings: Basic preferences (name, avatar, personality sliders)
Backend MVP¶
LLM Integration¶
- Model Configuration: Qwen3 Abliterated 8B with Ollama auto-management
- Character Personalities: Modelfile system with Eve as default character
- Prompt Engineering: System prompts with memory context integration
- Response Generation: Streaming completions via WebSocket
- Fallback Handling: Graceful degradation with health monitoring
- Auto-Management: Automatic Ollama installation and model pulling
Memory System¶
- Three-Tier Architecture: Working (LMDB) + Semantic (ChromaDB) + Knowledge Graph (DuckDB)
- Conversation Memory: 24-hour working memory with sub-millisecond access
- User Facts: Knowledge graph with 204 nodes, 27 edges, 552 properties
- Context Retrieval: Hybrid Search V3 (semantic + BM25 + IDF + RRF)
- Memory Consolidation: AMS with background consolidation tasks
- Semantic Search: ChromaDB with 768-dim multilingual embeddings
- Memory Album: User-curated conversation and message-level memories
- Entity Extraction: GLiNER zero-shot NER + LLM relationship extraction
- Entity Resolution: 3-step deduplication with semantic blocking
Personality Engine¶
- Character System: Ollama Modelfiles for custom personalities
- Eve Character: Warm, curious, contemplative default personality
- Expression Mapping: System prompts define communication style
- Consistency: Model parameters ensure character consistency
- Thinking Process: Ollama 0.12+ native thinking API
- Trait System: Formal Big Five/HEXACO implementation (future)
- Personality Configuration: User-adjustable personality sliders (future)
Emotion Recognition¶
- Text Sentiment: BERT Multilingual sentiment classification
- Emotion Analysis: RoBERTa 6-emotion classification
- Intent Classification: XLM-RoBERTa multilingual intent understanding
- Memory Album Integration: Automatic emotional tone detection
- Voice Analysis: Basic emotion detection from voice (planned)
- Behavioral Patterns: User mood learning over time (planned)
- Emotion History: Track emotional patterns (planned)
Emotion Simulation¶
- Canonical Emotion Labels: 12 emotions (neutral, calm, curious, playful, warm_concern, protective, focused, encouraging, reassuring, apologetic, tired, reflective)
- Avatar Expression: Emotion-driven facial expressions via ARKit blend shapes
- Smooth Transitions: 5% interpolation per frame for natural emotion changes
- Emotional Memory: Memory Album tracks emotional tone
- Basic Appraisal: Simplified Component Process Model for emotion generation (planned)
- Voice Coordination: Sync emotions across avatar and voice (planned)
- Empathetic Responses: Generate emotionally appropriate reactions (planned)
Basic Agency¶
- Task Scheduler: Cron-based scheduler with resource awareness
- Background Tasks: Log cleanup, key rotation, health checks, database vacuum
- AMS Tasks: Memory consolidation, feedback classification, Thompson sampling
- KG Tasks: Graph consolidation, entity resolution, relationship inference
- Conversation Continuity: Memory system enables referencing past conversations
- Initiative System: Proactive conversation starters (planned)
- Goal Generation: Self-formulated objectives (planned)
- Suggestion Engine: Context-based suggestions (planned)
- Proactive Timing: Intelligent initiative timing (planned)
Core Services¶
- Conversation API: REST + WebSocket endpoints with streaming
- Memory API: Store and retrieve memories (Memory Album)
- Knowledge Graph API: GQL/Cypher queries via CLI
- Modelservice API: ZeroMQ-based LLM, embeddings, NER, sentiment
- Health API: System health and status monitoring
AI Feature Integration¶
Module Coordination¶
- LLM-Memory Integration: Context assembly from three-tier memory system
- Personality-LLM Integration: Modelfile system defines character behavior
- Emotion Recognition Integration: Sentiment analysis for Memory Album
- Agency-Memory Integration: Task scheduler uses memory for consolidation
- Knowledge Graph Integration: Entity extraction and relationship modeling
- Emotion-Avatar Integration: Real-time facial expressions sync with emotion state
- Voice-Avatar Sync: Basic voice-activated talking animation and Web Audio API lip-sync implemented; higher-accuracy phoneme timing via Rhubarb planned
Validation Criteria¶
Core Functionality¶
- Remembers user preferences across sessions (three-tier memory)
- Shows consistent personality responses (Eve character via Modelfile)
- Recognizes user emotions (sentiment analysis)
- Works completely offline (local-first architecture)
- Responds within 3-5 seconds (streaming with <500ms first token)
- Initiates conversations without prompting (planned)
- Expresses emotions through avatar and voice (planned)
User Experience¶
- Smooth conversation interface with real-time streaming
- Personality feels consistent and authentic (Eve character)
- Encrypted local storage with fast load times (<200ms)
- Offline-first with graceful degradation
- Cross-platform support (macOS, iOS, Android, Linux, Windows)
- Voice interaction works seamlessly with text (planned)
- Avatar animations sync with conversation (planned)
- Proactive behavior feels natural (planned)
- Easy setup and configuration (partially complete)
Emotional Intelligence¶
- Detects user mood from text (BERT, RoBERTa)
- Remembers emotional context (Memory Album with tone)
- Shows emotional expressions via avatar (12 canonical emotions with ARKit blend shapes)
- Natural micro-expressions (realistic blinking, eye gaze)
- Detects mood from voice (planned)
- Responds empathetically to emotions (planned)
- Adapts communication style based on mood (planned)
Technical Requirements¶
Note: Core infrastructure provided by Foundation I (complete)
Implemented: - LLM: Qwen3 Abliterated 8B via Ollama with Modelfile system - Memory: Three-tier (LMDB + ChromaDB + DuckDB) with Hybrid Search V3 - Knowledge Graph: NetworkX + DuckDB with GQL/Cypher queries - Entity Extraction: GLiNER Medium v2.1 + sentence-transformers - Sentiment: BERT Multilingual + RoBERTa emotion + XLM-RoBERTa intent - Personality: Modelfile-based character system (Eve)
Implemented: - Avatar: InAppWebView + Three.js + Ready Player Me with emotion expressions - Facial Expressions: 12 canonical emotions via ARKit blend shapes (52 morph targets) - Natural Behaviors: Realistic blinking (180ms asymmetric), eye gaze, idle variations - Lip-sync (Phase 1): Web Audio API frequency-based viseme detection and ARKit blend shape mapping for real-time avatar lip-sync; Rhubarb Lip Sync integration planned for Phase 2
Planned: - Voice: Whisper.cpp (STT) + Coqui/Piper (TTS) integration - Emotion Simulation: AppraisalCloudPCT implementation - Formal Personality: Big Five/HEXACO trait system
Success Definition¶
User can have a 10-minute conversation where AICO: 1. Memory: Remembers something from earlier (three-tier memory system) 2. Personality: Shows consistent personality traits (Eve character via Modelfile) 3. Emotion Recognition: Detects user mood from text (BERT, RoBERTa) 4. Contextual Intelligence: Uses memory context for relevant responses 5. Knowledge Graph: Builds relationship understanding from conversations 6. Emotion Expression: Displays 12 canonical emotions via avatar facial expressions 7. Agency: Asks follow-up questions unprompted (planned) 8. Voice Interaction: Handles text and voice input/output (planned)
Current Status: Items 1-6 complete and operational. Items 7-8 pending voice integration and proactive behavior.
Status: 🚧 MVP IN PROGRESS - Core conversation and memory complete, voice and avatar pending.
Next: Complete voice and avatar integration, then proceed to Foundation II for advanced infrastructure.