MVP Roadmap¶
Goal: Basic companion that talks, remembers, and initiates.
Builds on Foundation infrastructure.
Frontend MVP¶
Chat Interface¶
- Text Chat UI: Flutter chat interface with message bubbles
- Real-time Updates: WebSocket connection for live conversation
- Typing Indicators: Show when AICO is thinking/responding
- Message History: Scrollable conversation history
- User Input: Text input with send button and enter key support
- Status Display: Connection status and AICO availability
Voice Interaction¶
- Speech-to-Text: Local Whisper.cpp integration for voice input
- Text-to-Speech: Local Coqui/Piper for voice output
- Voice Controls: Push-to-talk and voice activation
- Audio Processing: Noise reduction and audio quality optimization
- Voice Settings: Voice selection, speed, and volume controls
- Multimodal Input: Seamless switching between text and voice
Basic Avatar¶
- Simple Avatar: Basic 3D avatar in WebView (Ready Player Me)
- Idle Animation: Basic breathing/blinking idle state
- Speaking Animation: Lip-sync during AICO responses
- Basic Emotions: Happy, neutral, thinking expressions
- Avatar Controls: Mute/unmute, avatar on/off toggle
User Experience¶
- Onboarding: Simple welcome flow and personality setup
- Settings: Basic preferences (name, avatar, personality sliders)
- Offline Mode: Graceful degradation when backend unavailable
- Responsive Design: Works on desktop and mobile
Backend MVP¶
LLM Integration¶
- Ollama Setup: Local LLM model management and inference
- Model Loading: Automatic model download and initialization
- Prompt Engineering: System prompts for personality and context
- Response Generation: Generate contextually appropriate responses
- Resource Management: CPU/memory monitoring for LLM operations
- Fallback Handling: Graceful degradation when LLM unavailable
Memory System¶
- Conversation Memory: Store and retrieve conversation history
- User Facts: Remember user preferences, interests, and details
- Context Retrieval: Find relevant past conversations
- Memory Consolidation: Summarize and organize long-term memories
- Semantic Search: Vector-based similarity search for memories
Personality Engine¶
- Trait System: 5 core personality dimensions (Big Five subset)
- Expression Mapping: Translate traits to communication style
- Consistency Validation: Ensure responses align with personality
- Personality Configuration: User-adjustable personality sliders
- Behavioral Parameters: Warmth, formality, curiosity, proactivity
Emotion Recognition¶
- Text Sentiment: Natural language emotion understanding from user messages
- Voice Analysis: Basic emotion detection from voice tone and patterns
- Behavioral Patterns: User mood and preference learning over time
- Context Awareness: Situational emotion understanding
- Emotion History: Track user emotional patterns and trends
Emotion Simulation¶
- Basic Appraisal: Simplified Component Process Model for emotion generation
- Emotional States: Core emotions (happy, sad, excited, calm, curious)
- Expression Coordination: Sync emotions across avatar, voice, and text
- Emotional Memory: Remember emotional context of conversations
- Empathetic Responses: Generate emotionally appropriate reactions
Basic Agency¶
- Initiative System: Proactive conversation starters and engagement
- Goal Generation: Simple self-formulated objectives (check-ins, learning)
- Check-in Goals: Periodic user wellness and interest check-ins
- Suggestion Engine: Context-based activity and conversation suggestions
- Follow-up Questions: Ask relevant follow-up questions unprompted
- Conversation Continuity: Reference and build on previous conversations
- Curiosity Expression: Show interest in user activities and responses
- Proactive Timing: Intelligent timing for initiatives (not intrusive)
Core Services¶
- Chat API: REST endpoints for sending/receiving messages
- WebSocket API: Real-time bidirectional communication
- Memory API: Store and retrieve user memories
- Personality API: Get/set personality configuration
- Status API: System health and availability status
Integration Points¶
Message Bus Integration¶
- LLM Module: Subscribe to conversation events, publish responses
- Memory Module: Subscribe to conversation events, publish memories
- Personality Module: Publish personality parameters and traits
- Emotion Recognition Module: Subscribe to user input, publish detected emotions
- Emotion Simulation Module: Subscribe to events, publish AICO emotional states
- Agency Module: Subscribe to context and emotions, publish initiatives
- Voice Module: Subscribe to TTS requests, publish audio responses
Data Flow¶
- User Input: Frontend → API → Message Bus → LLM/Emotion Recognition Modules
- Memory Storage: Conversation events → Memory Module → Database
- Personality Context: Personality Module → LLM Module prompts
- Emotion Context: Emotion Recognition → Emotion Simulation → Expression
- Voice Processing: Voice input → STT → LLM → TTS → Voice output
- Proactive Behavior: Agency Module → Frontend notifications
- Avatar Sync: Emotion states → Avatar expressions and animations
Validation Criteria¶
Core Functionality¶
- Remembers user preferences across sessions
- Initiates conversations without prompting (agency)
- Shows consistent personality responses
- Recognizes and responds to user emotions
- Expresses appropriate emotions through avatar and voice
- Works completely offline
- Responds within 3-5 seconds
User Experience¶
- Smooth chat interface with real-time updates
- Voice interaction works seamlessly with text
- Avatar animations sync with conversation and emotions
- Personality feels consistent and authentic
- Emotional responses feel natural and empathetic
- Proactive behavior feels natural, not intrusive
- Easy setup and configuration
Emotional Intelligence¶
- Detects user mood from text and voice
- Responds empathetically to user emotions
- Shows appropriate emotional expressions
- Remembers emotional context of conversations
- Adapts communication style based on user mood
Technical Requirements¶
- Frontend: Flutter for cross-platform UI
- Backend: Python with FastAPI for core services
- LLM: Local Ollama instance (Llama 2 or similar)
- Voice: Whisper.cpp (STT) + Coqui/Piper (TTS)
- Storage: SQLite for memory, ChromaDB for embeddings
- Message Bus: ZeroMQ pub/sub (from Foundation)
- Avatar: Three.js + Ready Player Me + TalkingHead.js
- Emotion: Basic Component Process Model implementation
- Personality: Big Five trait system with expression mapping
Success Definition¶
User can have a 10-minute conversation where AICO: 1. Memory: Remembers something from earlier in the conversation 2. Agency: Asks a follow-up question unprompted and initiates new topics 3. Personality: Shows consistent personality traits and communication style 4. Emotion Recognition: Detects and responds appropriately to user's mood 5. Emotion Expression: Displays appropriate avatar expressions and voice tone 6. Voice Interaction: Seamlessly handles both text and voice input/output 7. Contextual Intelligence: Makes relevant suggestions based on conversation context 8. Proactive Engagement: Demonstrates curiosity and genuine interest in user responses