Scheduling and Operations¶

Cron as heartbeat¶

The harness runs on cron, not on hope.

More than 40 cron jobs fire across the operational day. That replaced session-bound loops like /loop, which die when the laptop closes, the session crashes, or the model gets interrupted. Cron survives all of those failure modes. It wakes the scripts on time, whether anyone is in chat or not.

Each job writes to a log. State files capture outputs that need structure. A daily digest reads those state files and logs, then turns them into a dashboard and a summary alert. The system keeps moving without a live operator in front of it.

What runs and when¶

The jobs fall into categories, not a rigid timeline. The exact schedule depends on the domain, but the categories are the same in any harness:

Pre-work checks. Health checks, credential verification, dependency status. These run before the operational window opens. If something is broken, the operator finds out before it matters.

Data collection. External signal scans, sentiment analysis, data feeds. These populate the state files that downstream jobs depend on.

Core automation. Screeners, monitors, trackers. The jobs that do the actual work the harness was built for. They read inputs, apply rules, produce outputs, and log everything.

Post-work synthesis. Daily digest, report generation, state cleanup. These summarize what happened and prepare the system for the next cycle.

Continuous monitoring. Log watchers, process health checks, heartbeat uploads. These run throughout the day at short intervals to catch silent failures.

Health checks with AI triage¶

The maintenance pass runs a fixed set of deterministic checks. Each check is a simple pass/fail against an observable condition: is this process running, is this log fresh, is this credential valid, is this service reachable.

After the checks finish, the results go to the local model for triage. The model reads the raw report and returns a prioritized assessment with likely root causes and suggested fixes. If the local model is down, the system falls back to a cloud model and labels the triage with whichever model produced it. That label matters when the operator reviews the report later.

This pattern works because deterministic checks are reliable but dumb. They catch problems but do not prioritize them. The model adds judgment without adding fragility, because the checks still run and still log even if the model fails.

Alert fatigue¶

The system sends a lot of notifications. Blocks from the audit hook. Warnings from shell safety checks. Maintenance failures. Automation results. Daily summaries.

That volume creates its own risk. If every event looks urgent, none of them do.

Two controls help. Throttling suppresses repeat alert types with a cooldown window. Aggregation rolls broad state into a single daily digest instead of sending individual updates. Those controls reduce noise but do not solve the whole problem. The operator still needs signal discipline from the scripts themselves. A system that alerts on every minor wobble trains its human to ignore alerts entirely.