Control, Safety & Transparency¶

1. Purpose¶

The Control, Safety & Transparency component defines how far AICO’s agency can go and how that power is exposed and governed at the UX/infra level. It sits on top of the Values & Ethics / policy engine and World Model to:

give users clear controls over autonomy and capabilities,
enforce permissions and modes,
provide audit trails of significant autonomous actions,
and answer "why did you do this / why didn’t you?" in human terms.

Values & Ethics decides what is allowed; this component decides how that is configured, enforced at the edges, and surfaced to humans.

2. Conceptual Model¶

Four core responsibilities:

User primacy & modes – users can configure, pause, or reset agency; choose overall safety/initiative modes.
Permissions & capabilities – manage whitelists/blacklists for tools, integrations, and action classes, implemented via the structured policy engine (agency-component-values-ethics.md).
Audit logging – record autonomous actions, triggering goals/plans, EvaluationResult decisions, tools used, and key context.
Explainability – generate human-understandable explanations based on ontology-backed provenance (PerceptualEvents, Goals, WorldStateFacts, policies).

3. Data Model (Conceptual)¶

AgencyMode
mode ∈ {cautious, balanced, experimental} (per-user, per-install).
Maps to different default PolicyProfiles and resource/initiative caps.
CapabilityConfig
Per tool/integration/action-class:
- capability_id, type (tool, integration, action_class),
- enabled (boolean),
- requires_explicit_consent (boolean),
- optional max_frequency / quotas.
Backed by PolicyRules in Values & Ethics; this is the UX-facing view.
PauseState
is_paused (boolean), scope (all/only_proactive/only_background), since, reason.
AuditEntry
audit_id, timestamp,
action_type (goal_created, plan_executed, tool_invoked, policy_block, consent_request, etc.),
actor (AICOAgent, component),
goal_id / plan_id / step_id (if applicable),
tool_id / skill_id (if applicable),
evaluation_result (from Values & Ethics),
affected_entities (Persons, LifeAreas, WorldStateFacts),
summary_text (human-readable description).
ExplanationArtifact
produced on demand; not necessarily stored long term.
contains: key provenance links (PerceptualEvent chain, policies, goals/plans, WM facts) and a short narrative.

4. Operations & Behaviour¶

SetAgencyMode(mode)
Updates AgencyMode; internally selects/updates the relevant ValueProfile / PolicyRules.
May adjust Scheduler/Lifecycle caps (e.g., fewer proactive tasks in cautious mode).
UpdateCapabilities(config_deltas)
Turn specific tools/integrations/action-classes on/off or toggle requires_explicit_consent.
Writes through to Values & Ethics as structured PolicyRules.
PauseAgency(scope, reason) / ResumeAgency()
Set PauseState and emit events to Scheduler/Goal Arbiter/Conversation so that:
- proactive behaviour and/or background tasks are reduced or stopped,
- user-initiated requests may still be honoured within policy.
RecordAuditEntry(event)
Called at key points in the Goal→Plan→Skill→Tool chain, especially when:
- a non-trivial tool is invoked,
- a high-impact goal/plan is started or stopped,
- a policy decision blocks or modifies behaviour,
- explicit consent is requested/received.
ExplainAction(action_ref)
Given a reference to an observed action (e.g., a proactive message, tool call, or blocked request), gather:
- triggering PerceptualEvents,
- relevant Goals/Plans/Intention,
- EvaluationResult(s) from Values & Ethics,
- key WorldStateFacts / LifeAreas,
- involved policies or capability settings.
Produce a short, human-oriented narrative that can be shown in UI or logs.

5. Integration with Other Components¶

Values & Ethics / Policy Engine
This component does not make independent allow/deny decisions; it configures and surfaces the policy engine:
- maps AgencyMode and UI toggles to PolicyRules/ValueProfiles,
- uses EvaluationResult for logging and explanations.
Goals & Arbiter
PauseState and AgencyMode influence Arbiter scoring and whether certain goal types can become active.
High-level user controls (e.g., "no new hobbies") may be implemented as capabilities/policies that Arbiter must respect.
Planner & Skills/Tools
CapabilityConfig and EvaluationResult are checked before invoking Skills/Tools.
Significant plan steps and tool calls yield AuditEntries and can be explained via ExplainAction.
Scheduler & Lifecycle
PauseState and AgencyMode can restrict what task queues are allowed to run, beyond normal Lifecycle rules.
Maintenance/critical safety tasks may be whitelisted even when agency is paused.
World Model & Memory/AMS
AuditEntries and policy-relevant events can be stored as MemoryItems and/or WM facts for long-term transparency and learning.
UI, Conversation & Embodiment
Conversation & UI expose controls (modes, permissions) and show explanations or audit summaries.
Embodiment can reflect paused or constrained states visually.

6. Persistence & Metrics¶

Persistence
AgencyMode, PauseState, and CapabilityConfig are persisted in libSQL config tables, aligned with Values & Ethics and Skill/Tool registries.
AuditEntries are stored append-only in a dedicated audit log table, with optional promotion into AMS/KG where needed.
No separate store for policies; those live in the Values & Ethics component.
Metrics & Visibility
Metrics such as actions_blocked_by_policy, agency_initiated_messages, and safety_profile in agency-metrics.md are fed by this component’s logging and configuration.
Additional internal metrics (e.g., audit log volume, mode changes, pauses) can be surfaced for operators.

This design keeps the core policy logic inside Values & Ethics while providing a clear, inspectable, and user-controllable surface over AICO’s autonomy.