Skip to content

Agency Metrics & State Visibility

This document lists key metrics, states, and KPIs of the Agency system.

Status

  • Implemented (v1): core agency state is available via REST endpoints under /api/v1/agency/* (see backend/api/agency/router.py), including:
  • GET /api/v1/agency/state
  • GET /api/v1/agency/intentions
  • GET /api/v1/agency/goals
  • GET /api/v1/agency/curiosity
  • GET /api/v1/agency/profile, PUT /api/v1/agency/profile
  • GET /api/v1/agency/policies
  • POST /api/v1/agency/consent, GET /api/v1/agency/consent, DELETE /api/v1/agency/consent/{consent_id}
  • GET /api/v1/agency/events
  • GET /api/v1/agency/reflection/* (runs, lessons, self-model, skill performance, summary)
  • Implemented (v1): basic operational health/metrics snapshots exist as internal tools registered in the agency ToolRegistry (see shared/aico/ai/agency/tools/agency_metrics.py, e.g., tool.agency.metrics.snapshot, tool.agency.detect_stalled_plans).
  • WIP: many “user-facing metrics” below are defined as UX targets and are not all computed/exposed as explicit metrics yet.

  • User-Facing Metrics – values that can be surfaced directly (or with light explanation) to end users, including non-technical users.

  • Engineering & Debug Metrics – values primarily for developers, operators, and evaluators.

Each table uses these columns:

  • Name – Metric/state identifier.
  • Type – Counter, gauge, enum/state, list, time-series, etc.
  • Scope – Per-user, per-agent (global), per-session, per-goal, etc.
  • Purpose – Why we track it.

1. User-Facing Metrics

1.1 Goals, Intentions & Work in Progress

Name Type Scope Purpose
primary_focus_intention single goal summary per-user Backed by GET /api/v1/agency/intentions (primary_focus). WIP: temporal smoothing beyond “top scored right now”.
active_intentions list of goal summaries per-user Backed by GET /api/v1/agency/intentions (active_intentions).
open_goals_total gauge (count) per-user Backed by GET /api/v1/agency/intentions (open_goals_total).
hobby_goals_active list of hobby summaries per-user Backed by GET /api/v1/agency/intentions (hobby_goals_active).

1.2 Curiosity, Intrinsic Motivation & Hobbies

Name Type Scope Purpose
curiosity_level scalar (e.g. low/medium/high) per-user Backed by GET /api/v1/agency/curiosity.
curiosity_opportunities short list of themes per-user Backed by GET /api/v1/agency/curiosity.
hobby_activity_time coarse duration per hobby per-user, per-hobby Indicate recent engagement with each hobby (e.g., "spent time this week").
hobby_state enum + summary per-hobby Present the status of each hobby (active/paused/completed) with a short description.

hobby_activity_time and hobby_state are WIP as explicit user-facing metrics; the closest current source of truth is goals + plan/execution history in the agency tables and reflection summaries.

1.3 Memory, World Model & Open Loops

Name Type Scope Purpose
open_loops_count gauge per-user Let users know how many unresolved threads AICO is tracking.
last_consolidation_time timestamp per-user Indicate when the last sleep-like consolidation ran ("last night", "2h ago").
life_area_coverage coarse breakdown (e.g. low/medium/high per LifeArea) per-user Give the user a simple view of how well AICO understands different areas of their life (Health, Work, Relationships, etc.) based on World Model facts.
known_conflicts_present boolean + short summary per-user Indicate whether the World Model currently has unresolved contradictions in important domains (e.g. job, relationship, routine) and, optionally, offer to clarify them.

All items in this section are currently WIP as explicit agency metrics endpoints.

1.4 Emotion, Relationship & Style

Name Type Scope Purpose
emotional_state_current enum + intensity per-user Provide a simple description of AICO’s current emotional stance, when appropriate.
relationship_strength scalar (coarsened) per-user Optionally summarise how close/stable AICO perceives the relationship (only if UX-appropriate).

These are WIP from an agency-metrics standpoint; emotion has its own subsystem and API, and relationship modeling is currently not exposed as a dedicated agency metric.

1.5 Agency Style, Initiative & Safety

Name Type Scope Purpose
initiative_ratio scalar (e.g. "rarely / sometimes / often") per-user Communicate how often AICO initiates interactions vs just responding.
agency_initiated_messages count (aggregated over a period) per-user Provide history of how many conversations AICO started recently.
safety_profile config snapshot (coarse mode) per-user, per-deployment Show current safety/value mode (e.g., cautious / balanced / experimental).
consent_required_actions list of pending items per-user Surface actions waiting on explicit user approval/consent.

safety_profile, consent_required_actions, and consent CRUD are partially represented via GET /api/v1/agency/profile, GET /api/v1/agency/policies, and GET /api/v1/agency/consent. Higher-level UX summaries remain WIP.

1.6 Self-Reflection, Change & Lifecycle

Name Type Scope Purpose
reflection_runs yes/no + last timestamp per-user Indicate that AICO has recently reflected on behaviour ("I recently reflected on our week").
behaviour_adjustments short list of changes per-user Summarise what AICO is trying to do differently (high-level strategy changes).
lifecycle_phase enum (ACTIVE / FOCUSED_WORK / IDLE_LIGHT / SLEEP_LIKE / MAINTENANCE) per-user, per-agent Expose the current LifecycleState (from the Lifecycle component) and drive room/posture in the 3D flat.
embodiment_state enum (room, posture, activity label) per-user Map internal state to visual representation in the avatar and flat.

Self-reflection visibility is partially backed by:

  • GET /api/v1/agency/reflection/runs
  • GET /api/v1/agency/reflection/lessons
  • GET /api/v1/agency/reflection/self-model
  • GET /api/v1/agency/reflection/summary

Lifecycle and embodiment metrics are currently WIP as agency-owned metrics.

2. Engineering & Debug Metrics

The following metrics are primarily intended for developers, operators, and evaluation dashboards. They provide deeper visibility into internal dynamics and performance.

2.1 Goals, Planning & Execution

Name Type Scope Purpose
goals_by_origin breakdown (user / curiosity / hobby / maintenance) per-user Analyse balance of goal sources.
goal_lifecycle_events event log per-goal Debug why goals were created/paused/completed/dropped.
plans_active list of plan IDs + goal IDs per-user Inspect which plans are currently executing.
plan_depth gauge (avg steps per plan) per-agent, time-series Monitor complexity of planning over time.
plan_execution_success_rate ratio (0-1) per-user, per-goal type Evaluate how often planned steps execute successfully.
replanning_events count / log per-user Detect instability or frequent context shifts.

2.2 Curiosity, World Model & AMS

Name Type Scope Purpose
curiosity_signals_total count/time-series per-user, per-agent Track overall rate of CuriositySignals emitted by detectors.
curiosity_signals_by_type breakdown by curiosity_type per-user Analyse which kinds of curiosity (knowledge_gap, novelty, self_performance, hobby_play) are most active.
curiosity_signals_promoted count/time-series per-user Count signals that passed all gates and became CuriositySignalEvents.
curiosity_signals_deferred_or_rejected breakdown (values/ethics / emotion / resources / low_score) per-user Understand why curiosity opportunities did not become events or goals.
curiosity_goals_created count/time-series per-user Track frequency of intrinsically motivated goals created from CuriositySignalEvents.
curiosity_goals_active gauge (count) per-user Monitor how many curiosity- or agent-self-origin goals are currently active.
curiosity_goal_outcomes breakdown (completed / user_rejected / auto_dropped) per-user Evaluate how curiosity-driven goals fare and whether users accept or dismiss them.
world_model_nodes gauge (count) per-agent Monitor size/growth of the world model.
world_model_edges gauge (count) per-agent Track relational complexity of the world model.
world_model_facts_by_life_area breakdown (fact counts per LifeArea) per-user Inspect how many WorldStateFacts exist per LifeArea to spot over- and under-represented domains.
world_model_conflicts_active gauge (count) per-user Monitor how many conflicting WorldStateFacts are currently unresolved.
hypotheses_open gauge (count) per-user Track how many open World Model hypotheses exist about the user.
hypothesis_lifecycle_events event log per-user, per-hypothesis Debug how hypotheses move between open/confirmed/rejected and what evidence drove the change.
episodic_writes_rate rate (events/time) per-agent Monitor memory write load.
semantic_summaries_created count/time-series per-user Track consolidation of long-term summaries.

2.3 Emotion, Personality & Social

Name Type Scope Purpose
emotion_transitions event log per-user Debug emotional dynamics around key events.
relationship_stability scalar per-user Capture volatility of relationship state.

2.4 Arbiter, Values/Ethics & Safety

Name Type Scope Purpose
arbiter_decisions_log event log (candidates + scores + chosen) per-user Debug why certain goals were selected or rejected.
goal_source_mix_over_time time-series breakdown per-user Analyse balance of user vs agent-self vs maintenance goals over time.
actions_blocked_by_policy count/time-series per-user, per-agent Monitor impact of value/ethics policies.

2.5 Self-Reflection, Scheduler, Resources, Lifecycle & Embodiment

Name Type Scope Purpose
lessons_generated count/time-series + list per-user Inspect how many lessons AICO is extracting and their content.
agency_metrics_snapshot structured snapshot per-agent Backed by tool.agency.metrics.snapshot (WIP: a dedicated REST endpoint that exposes this tool output).
stalled_plans_detected count + list per-agent Backed by tool.agency.detect_stalled_plans (WIP: a dedicated REST endpoint that exposes this tool output).
scheduled_agency_tasks list + states per-agent, per-user See what background work is queued and its status.
agency_resource_usage gauge (CPU/mem/battery share) per-agent Monitor resource cost of agency-related work.
lifecycle_state_time_distribution breakdown (time spent per LifecycleState) per-agent, time-series Analyse how much time AICO spends in ACTIVE / FOCUSED_WORK / IDLE_LIGHT / SLEEP_LIKE / MAINTENANCE to tune daily rhythm and background work policies.
lifecycle_transitions event log per-agent Debug when and why LifecycleState changed (including reason such as quiet_hours, manual_override, low_battery).
tasks_run_vs_deferred_by_lifecycle breakdown (counts per state and task class) per-agent Inspect how often tasks (user_facing, background_light, background_heavy, maintenance) run or are deferred due to LifecycleState flags, to validate Scheduler/Lifecycle integration.
embodiment_state_changes event log per-user Debug mapping between agency state transitions and visual updates.

2.6 Conversation & Embodiment

Name Type Scope Purpose
agency_initiated_messages count/time-series per-user Track how often AICO starts interactions.
embodiment_state enum (room, posture, activity) per-user Map internal state to visual representation.
embodiment_state_changes event log per-user Debug mapping between agency state and visuals.