Skip to content

Memory and Continuity

Memory model

The harness keeps memory in files. That choice shapes the whole system.

Each memory note is a Markdown file with YAML frontmatter. The frontmatter carries three fields: name, description, and type. The body holds the durable fact. The file name gives the topic a stable home. The frontmatter makes the file readable by people and easy to index with scripts.

The harness uses four memory types. user files describe the operator, their habits, and facts that do not belong in source code. feedback files capture corrections and preferences, because those compound over time. project files hold live context for work that spans days or weeks. reference files point to outside systems, credentials locations, or infrastructure that the agent must remember without copying the source into code.

These categories are not arbitrary. They map closely to what cognitive scientist John Vervaeke identifies as the four ways of knowing: propositional (knowing that, facts), procedural (knowing how, processes), perspectival (knowing what it's like, viewpoint and preferences), and participatory (knowing by being in relationship, attunement over time). Propositional maps to facts. Procedural maps to project and operational context. Perspectival maps to user preferences. Participatory maps to feedback, where the agent learns how to work with a specific person by accumulating corrections that change its behavior across sessions.

This matters for safety. The memory policy engine in the hosted version of ASH uses these same categories to classify risk. A fact describes. A procedure directs action. The classification determines the trust level: facts are low-risk and stored immediately, while procedures are high-risk and quarantined until a human approves them. The cognitive science framework explains why this classification works. Different kinds of knowing carry different amounts of power over behavior. The policy engine respects that distinction.

MEMORY.md is the index. The session loader reads it every time. That file should stay short, capped at 200 lines, because it exists to route the agent to the right note, not to store the whole world. If the index grows past that limit, retrieval quality drops and session startup slows down.

How sessions start

Session startup is a script, not a ritual. scripts/guardrails/session_start.py is the entry point. It shells into the continuity loader, which reads the current thread state and assembles the session context before the agent answers anything.

Two files matter at the start. conversation_state.json tells the agent which threads are open, which are closed, what the last summary said, and what comes next. session_memory_context.md carries retrieved context for the current session when the loader finds relevant memory. If that file exists, the agent reads it and starts from there.

That handoff removes the usual reset between sessions. The operator does not need to restate what was in progress, which channel it came from, or what the next action should be. The context is already in files.

Thread continuity

conversation_state.json is the thread ledger. It tracks active and closed threads, with fields for title, status, summary, next steps, channels seen, and timestamps such as opened_at and last_activity. The file also stores session-wide context such as the last model and the last session notes.

This file survives session restarts, channel switches (webchat to Telegram), and model changes. The state lives outside the model, so the system never depends on a single context window staying alive.

The file also gives the operator a clean answer to a hard question: what is still open? Without that ledger, half-finished work disappears into chat history.

Feedback: how the agent improves over time

Feedback is the most valuable memory type. Every other type describes the world. Feedback changes the agent's behavior.

When the operator corrects the agent ("don't do that", "check memory first", "push to the correct repo"), the correction gets saved as a feedback_*.md file. The file records three things: the rule itself, why it exists, and how to apply it in future sessions. That structure matters because a rule without context gets applied too broadly, and a rule without application instructions gets ignored.

The pattern works because corrections compound. The first time the agent makes a mistake, the operator catches it. The second time, the memory catches it. Over weeks, the agent accumulates dozens of behavioral corrections that would be impossible to maintain in a system prompt or set of instructions. The corrections are specific ("MkDocs serve needs a manual restart after edits"), contextual ("because live reload doesn't work in our setup"), and durable (they survive session boundaries, model changes, and context window resets).

Skills have their own feedback mechanism. Each skill directory can include a feedback.log file. The skill reads that log before every execution and applies all entries. During execution, new corrections get appended. This keeps skill-level lessons close to the skill itself instead of buried in the general memory index.

Examples of feedback that changed behavior:

  • "Never make the workspace repository public." Saved after a 30-second accidental exposure. Now blocks the action even if the operator requests it.
  • "Check memory before asking the user." Saved after the agent asked for information that was already stored.
  • "Push changes to the dedicated repo, not the workspace copy." Saved after a fix went to the wrong repository.
  • "The wiki server does not auto-reload." Saved after the operator corrected this multiple times in one session.

Without feedback memory, these corrections evaporate between sessions. The agent makes the same mistake on Tuesday that it made on Monday. With feedback memory, the correction persists and the mistake stops recurring. That compounding effect is the single biggest quality improvement in the entire harness.

Writing good memory

Good memory files stay narrow. One topic per file. Short descriptions. Explicit decisions with dates when dates matter. If the note records a rule, name the condition and the outcome. If it records a correction, write the correction in the form future sessions can use.

Do not duplicate the same fact across three notes. Duplication creates drift, and drift creates false confidence. Link to the canonical note from the index and leave it there.

Memory should hold intent, constraints, preferences, and external pointers. Code should hold implementation. Git history should hold the edit trail. A memory file is the right place for "never make this repo public" or "use the dedicated data API for production inputs." It is the wrong place for a line-by-line description of a function change. That belongs in the diff.

Failure modes

The first failure mode is stale memory. A note can outlive the code that replaced it. When that happens, the note wins the retrieval battle and loses the truth test. The fix is simple and boring: verify retrieved context against current code before acting on it.

The second failure mode is duplicate entries. Two files cover the same topic, then one gets updated and the other does not. The operator reads one thing. The agent reads another.

The third failure mode is bloat. Once MEMORY.md turns into a long transcript, the index stops being an index. Keep the index short and push detail into topic files.

The fourth failure mode is over-trust. Retrieved context is a lead, not a verdict. The system still needs a check against source files and state files.

The fifth failure mode is time drift. Relative phrases such as "tomorrow" or "last week" rot fast. Use absolute dates in memory when a date affects action. A note that says "exit on Monday 03/23" still means something next month. A note that says "exit tomorrow" does not.