System Architecture¶

The actors¶
Six things participate in this system. They do not all run at the same time, and most of them do not talk to each other directly.
Human operator. Sets goals, reviews results, approves risky actions, and makes final domain decisions. The harness reduces the operator's workload but never removes the operator from decisions that matter.
Claude Code (Opus 4.6). The orchestrator. Runs interactively during sessions, delegates to other models, reads and writes memory, manages state files, and makes tool calls. When the system needs judgment, Claude provides it. When the system needs speed or specialized capability, Claude delegates.
Codex (GPT-5.4). Handles bounded, parallel work. Code audits, builds, validation runs, and design tasks get dispatched through a wrapper script that constrains filesystem access and budget. Codex does not see the operator's conversation. It receives a prompt file and returns a response file.
Grok (xAI). Handles live web research. Searches the web, synthesizes findings, and flags uncertainty. Claude does not search the web. When the harness needs current information, it delegates to Grok.
Local model. Runs on the same machine for zero-cost triage and analysis. The maintenance system sends health check reports to this model for prioritized assessment. If the local model is down, the system falls back to Grok automatically.
Cron scheduler. The heartbeat. Over 40 jobs run throughout the day on fixed schedules. Automated tasks, monitors, trackers, health checks, log watchers, and report generators all fire without human intervention. Cron is more reliable than session-based scheduling because it survives restarts, crashes, and closed laptops.
Control plane vs work plane¶
The harness separates two concerns:
The control plane handles orchestration, memory, guardrails, and state management. It answers: what should happen, who should do it, what happened last time, and is this action safe.
The work plane handles the actual tasks. Automated jobs score inputs, monitors track live state, validation runs test assumptions, and research agents search the web. The work plane does useful things. The control plane makes sure those things are done correctly and remembered.
This separation matters because the workload scripts change often, but the memory system, audit hook, and scheduling infrastructure barely change at all. New workflows get added to the work plane. The control plane stays stable.
State stores¶
State lives in files, not inside any model. This is a deliberate choice. Files persist across sessions, models, and restarts. Model context windows do not.
conversation_state.json. The thread ledger. Tracks active and closed threads with summaries, next steps, and metadata. When a new session starts, this file tells the agent what work is in progress across all channels.
memory/*.md files. Long-term memory. Each file has YAML frontmatter with a name, description, and type (user, feedback, project, or reference). MEMORY.md is a short index that gets loaded every session. Individual memory files are loaded on demand when relevant.
JSONL audit logs. Every tool call gets a one-line JSON entry with timestamp, session ID, tool name, file path, blocked/warning status, and a command snippet. This is the forensic record.
JSON state files. Policy state, throttle state, episode tracking, operational data, input diffs, and validation results all live in dedicated JSON files. Scripts read and write these files. The dashboard reads all of them.
Tracker JSON files. Workflow trackers hold every open and closed work item with full attribution: state transitions, outcome metadata, operator notes, and timestamps.
Design principles¶
Four rules that shaped every decision in the harness:
Recoverability over cleverness. The system should be easy to restart, easy to debug, and easy to understand after a crash. Files over databases. JSON over binary formats. Explicit state over inferred state.
Explicit files over hidden state. If a fact matters, it goes in a file with a name that describes what it contains. No relying on model memory, conversation history, or assumptions that the agent already knows.
Fail-open on infrastructure, fail-closed on safety. If the audit hook crashes, the tool call still executes. The session does not die because a logging script had a bug. But if a write targets a protected file, the action is blocked regardless of context.
Skip bureaucracy, not evidence. The development pipeline has stages for a reason. A quick fix does not need a full planning phase. But every code change needs verification (did it compile, did it run) and review (did it touch anything unexpected). Speed changes. Evidence requirements do not.