System architecture¶
A wiki built by agents. Local-only runtime. Closed-loop control. Budget-aware. Source-disciplined.
This page complements
docs/system/sense.md. Sense = what we're building and for whom. Architecture = how the system that builds it is wired.
1. The mental model (one paragraph)¶
FAIRE is a wiki + curated reading list + nudge-to-build, written by editorial agents that read primary sources only, organized in three layers — curriculum (range, one page per concept), arc index (opinionated path to a frontier capability), arc step (one build per page, MVB as milestone, compounding artifact contract). Voice is reference, not pitch, not tutorial. Citations are seminal · test-of-time · current SotA, nothing else. The system is closed-loop: it observes its own state, decides what to write next, writes it, reviews it through a panel of critics, and learns from the feedback for the next cycle. It runs on a $10/cycle budget, locally.
2. The four contracts¶
Every page the system writes must honor four contracts simultaneously. A critic owns each.
| # | Contract | Owner skill | Failure mode it catches |
|---|---|---|---|
| 1 | Sense — page matches what FAIRE is for | faire-sense |
Tutorial-ish, pitchy, exhaustive-survey style |
| 2 | Human — each of the four readers gets what they came for | critic-human-centered, critic-beginner-onramp |
Wall of equations with no intuition; jargon dump |
| 3 | Source — citations are seminal/test-of-time/SotA, approved domains only | source-policy, critic-info-architecture |
Medium links, Wikipedia citations, filler readings |
| 4 | Nudge — page ends with a directed, specific invitation to do something | critic-build-nudge, mvb-recipe |
"Try training a model" generic CTA |
3. The layers (top to bottom)¶
┌─────────────────────────────────────────────────────────────────┐
│ Layer 5 — Human interface │
│ mkdocs site (docs/) · /server :8765 dashboard │
└─────────────────────────────────────────────────────────────────┘
▲
│ writes / reads
┌─────────────────────────────────────────────────────────────────┐
│ Layer 4 — Editorial pipeline (LangGraph, per-page) │
│ load_persona → read_stub → research → plan_and_scratch │
│ → build_writing_checklist → write_{draft|arc_step| │
│ arc_index} → link → review (rubric + 8-critic panel) │
│ → revise → review' → keep_best_draft (knockout) → │
│ route_after_review → write_file (H1-fixed, arc- │
│ breadcrumbed) → commit (if conf ≥ 0.7) → log_run │
└─────────────────────────────────────────────────────────────────┘
▲
│ uses
┌─────────────────────────────────────────────────────────────────┐
│ Layer 3 — Skills (agents/skills/*.md) │
│ faire-sense · wiki-prose · math-latex · mvb-recipe │
│ source-policy · sota-coverage · navigation-ia · arc-context │
│ critic-human-centered · critic-beginner-onramp · │
│ critic-wiki-voice · critic-info-architecture · │
│ critic-build-nudge │
└─────────────────────────────────────────────────────────────────┘
▲
│ accesses
┌─────────────────────────────────────────────────────────────────┐
│ Layer 2 — Tools (agents/src/frontier_agents/tools.py) │
│ Exa: papers · sota · production · find_similar │
│ HF: models · datasets │
│ FS: read_stub · write_file · ensure_track_index │
│ Git: git_commit (auto when conf ≥ 0.7) │
└─────────────────────────────────────────────────────────────────┘
▲
│ routed via
┌─────────────────────────────────────────────────────────────────┐
│ Layer 1 — Models (OpenRouter via LangChain ChatOpenAI) │
│ writer · reviewer · research · mvb · fallback │
│ Budget gate: full → reduced → paused │
└─────────────────────────────────────────────────────────────────┘
▲
│ measured by
┌─────────────────────────────────────────────────────────────────┐
│ Layer 0 — Closed control loop (per-cycle) │
│ observer (sensor) → supervisor (controller) → │
│ sprint (actuator, N pages parallel) → runs.jsonl (feedback) │
│ → retrospective (reflector: scrum-style retro + safe │
│ auto-applies stub-seeds) → next cycle's supervisor │
│ Set points: quality 0.85 · coverage 0.80 · staleness 180d │
└─────────────────────────────────────────────────────────────────┘
4. The agent roster¶
| Agent | What it is | Inputs | Outputs | Lives in |
|---|---|---|---|---|
| Supervisor | Decides what to write next | observer + audit + runs.jsonl | rewrites sprints/current.md |
supervisor.py |
| Persona loader | Picks the track's editorial voice | track id | persona dict in state |
nodes.py::load_persona_node |
| Stub reader | Picks up any existing draft | output_path | existing_stub |
nodes.py::read_stub_node |
| Research agent | 3-channel Exa search (papers/SotA/production) + HF model+dataset lookup | topic + persona search_seeds | research_results, sota_results, production_results, hf_models, hf_datasets |
nodes.py::research_node |
| Planner | 5-question planning prompt → 200-word writing plan | research results | writing_plan |
nodes.py::plan_and_scratch_node |
| Scratch compiler | Verified fact-sheet (citations, equations, prod examples, MVB stack, opening scenario, open problem) | writing_plan + raw research | scratch_pad (writer never sees raw results) |
same node |
| Checklist builder | Promotes scratch_pad facts to mandatory: must-cite papers (arxiv-id resolved), must-use HF model IDs (pre-verified by verify_mvb_stack), must-include equations, must-link concept slugs |
scratch_pad | writing_checklist dict |
nodes.py::build_writing_checklist_node |
| Writer | Produces a full schema-compliant page draft constrained by the checklist | persona + plan + scratch_pad + checklist | draft |
nodes.py::write_{draft,arc_step,arc_index}_node |
| Sanitizer + H1 fixer | Strips fenced YAML and preambles; promotes the first heading after frontmatter to # Topic if writer drifted to ## Topic |
draft | sanitized final | nodes.py::_sanitize_draft, _ensure_h1 |
| Linker | Finds related curriculum pages, injects real backlinks; updates backlinks.json |
draft + filesystem | draft with injected links | nodes.py::link_node |
| Critic panel (8 critics, parallel) | Each critic-* skill spawns one parallel API call scoring its dimension (info-architecture · beginner-onramp · human-centered · wiki-voice · build-nudge · cohesion · coverage · prerequisites). Combined with structured rubric reviewer + deterministic checklist enforcement + future-arxiv-ID validator |
draft + scratch_pad + checklist | per-critic {score, issues, fixes}, aggregated review_confidence |
nodes.py::review_node, _run_critic_panel, _aggregate_review |
| Reviser | Up to 2 revision passes on flagged drafts | draft + critic feedback | revised draft | nodes.py::revise_draft_node |
| Knockout selector | After revise + re-review, keeps the higher-confidence of {previous, revised}; restores prior draft if revision regressed by ≥0.02 (PerFine pattern, arxiv 2510.24469) | review_confidence vs prev_review_confidence | draft, review_confidence |
nodes.py::keep_best_draft_node |
| Committer | git add + commit if confidence ≥ GIT_COMMIT_THRESHOLD; never-throw-away routing lands ≥0.6 drafts on disk |
output_path + confidence | git side-effect | nodes.py::commit_node |
| Logger | Appends run record; recomputes metrics + observer page | full state | runs.jsonl + metrics.json + observer.md | nodes.py::log_run_node |
| Observer | Builds WikiObservation snapshot (sensor) |
filesystem + runs.jsonl + OpenRouter | metrics.json + observer.md + budget state | observer.py::observe |
| Audit | Structural scan (banned URLs, missing sections, nested lists, frontmatter) | docs/ | last_audit.json |
audit.py::audit_wiki |
| Retrospective (backlog agent) | After every cycle: aggregates deterministic signals (per-track health, recurring critic issues, unresolved wikilinks, heading drift, citation health), runs scrum-style retro through gpt-5-mini with structured output, auto-applies safe items (stub-seeds for high-reference unresolved slugs) | runs.jsonl + supervisor.json + backlinks.json + sprint queue | docs/system/backlog.md, auto-seeded stub files |
retrospective.py::retrospective_job |
5. The skills / memory boundary¶
These are different on purpose:
| Layer | Lives in | Read by | Persists |
|---|---|---|---|
| Agent skills | agents/skills/*.md |
LangGraph nodes via skills.py loader |
Across cycles; injected into writer/reviewer prompts |
| Personas | agents/src/frontier_agents/personas/{track}.yaml |
load_persona_node |
Per-track voice; rarely changes |
| Scratch pad | state["scratch_pad"] |
Writer, reviser | One run only — discarded after write_file |
| Run log | agents/runs/runs.jsonl |
observer, supervisor | All-history; quality trend computed over last 10 |
| Metrics | agents/runs/metrics.json |
dashboard, supervisor | Overwritten every run |
| Sprint queue | agents/sprints/current.md |
scheduler, supervisor | Rewritten by supervisor every cycle |
| Claude memories | ~/.claude/projects/.../memory/*.md |
future Claude conversations | Across sessions; never read by agents |
Rule of thumb: if the agents need it, it's a skill or persona. If future Claude needs it, it's a memory.
6. The file-system contract¶
Every file in the repo has one of these jobs. Anything else is cruft.
| Path | Role | Owner |
|---|---|---|
docs/index.md |
Public homepage | human, hand-tuned |
docs/curriculum/{N}/index.md |
Track scaffold = the seed for what to write | human seeds; supervisor extends |
docs/curriculum/{N}/{slug}.md |
One concept page | writer agent |
docs/arcs/index.md |
Arc registry / overview | human |
docs/arcs/{arc}/index.md |
One arc syllabus | writer agent (mode=arc-index) |
docs/arcs/{arc}/step-NN-{slug}.md |
One build page (MVB lives here) | writer agent (mode=arc-step) |
docs/system/sense.md |
What FAIRE is | human + Claude |
docs/system/architecture.md |
This page | human + Claude |
docs/system/observer.md |
Live control dashboard | observer agent, auto-overwritten |
docs/system/supervisor.md |
Latest supervisor report | supervisor agent |
docs/system/changelog.md |
Per-page generation log | logger agent |
docs/system/backlog.md |
Sprint retrospectives (scrum-style: went-well, went-wrong, needs-depth, new-to-add, process-improvements) | retrospective agent |
docs/system/learnings-log.md |
Human-curated cross-cycle learnings | human |
docs/system/backlinks.json |
Forward/reverse link index | linker agent |
agents/sprints/current.md |
Work queue | supervisor agent |
agents/sprints/history/* |
Archived sprints | scheduler |
agents/runs/runs.jsonl |
Run record append log | logger agent |
agents/runs/metrics.json |
Latest observation | observer agent |
agents/skills/*.md |
Agent skills | human + Claude |
agents/.env |
Keys + model IDs + budget cap | human only |
agents/src/frontier_agents/personas/*.yaml |
Per-track voice | human + Claude |
agents/src/frontier_agents/*.py |
The system itself | human + Claude |
PRINCIPLES.md |
The 4 objectives + 10 rules | human |
README.md |
Repo intro | human |
7. The self-control mechanisms (closed loop)¶
This is what makes the system actually self-control rather than just be "automated."
7.1 Set points (the system's goals)¶
QUALITY_SETPOINT = 0.85— reviewer confidence per pageCOVERAGE_SETPOINT = 0.80— fraction of pages with real content per trackSTALENESS_THRESHOLD = 180days — when SotA goes staleBUDGET_LIMIT_USD— soft cap on OpenRouter spend
7.2 Error signals (compute_error_signals)¶
Per observation:
- coverage_deficit = max(0, COVERAGE_SETPOINT - coverage_pct)
- quality_deficit = max(0, QUALITY_SETPOINT - avg_confidence)
- stale_pages, flagged_pages (counts)
- budget_pressure = 1 - remaining / BUDGET_REDUCED
7.3 Actuator modes (driven by budget)¶
- full → claude-opus or gpt-5-class writer, full panel of critics
- reduced → writer drops to FALLBACK_MODEL; skip low-priority improvements
- paused → no generation; only audit + improve for already-generated pages
7.4 Feedback paths (the loops that actually close)¶
| Loop | Where it closes |
|---|---|
| Per-run quality (within page) | review fails → revise → re-review → knockout selector keeps higher-confidence draft (max 2 revisions) |
| Per-run hallucination guards | deterministic future-arxiv-ID validator + checklist enforcement inside review_node — pure regex/arithmetic, zero LLM cost |
| Per-cycle quality | runs.jsonl → quality_trend → supervisor adjusts sprint priorities |
| Per-cycle coverage | filesystem stub count + unresolved wikilinks → supervisor queues generate actions |
| Per-cycle retrospective (new) | runs.jsonl + critic_panel + backlinks.json → retrospective_job → backlog.md + auto-seeded stubs → supervisor reads on next cycle |
| Long-horizon voice | (planned) failed-critic patterns → persona YAML diff proposal |
| Budget | OpenRouter /auth/key → check_budget → mode change → sprint_job behavior |
The retrospective loop is the centerpiece — it makes the system learn from itself. Each cycle's scrum retro names what went well, what regressed, what needs depth, and what to add. Safe items (stub-seed for unresolved wikilinks referenced 2+ times) auto-apply; risky items (queue-priority changes, arc proposals, author pages) are queued for human review. The next cycle's supervisor sees the new stubs and the retro context, and the wiki grows in the direction the agents themselves identified as weak.
8. What a real cycle actually cost ($30 top-up, May 2026)¶
These are measured numbers from the production run that shipped 60 v2 pages across the 10 canonical tracks. Model stack: writer/MVB = openai/gpt-5.1-codex-mini, reviewer = openai/gpt-5-mini, critics = google/gemini-3.1-flash-lite, research = google/gemini-3.5-flash.
| Metric | Value |
|---|---|
| Pages shipped (approved) | 53 / 60 (88%) |
| First-try approval rate | 73% |
| Average confidence | 0.76 |
| Tracks with ≥4 concept pages | 10 / 10 |
| Total spent | $7.41 |
| Per-page cost (incl. revisions) | $0.18 |
| Budget remaining (of $30 top-up) | $22.77 |
| Wall-clock for full sprint | ~3.5 hours @ 4 parallel workers |
| Retrospective cycle (gpt-5-mini, structured output) | $0.04, 25s |
| Auto-seeded stubs from first retro | 7 |
Per-page breakdown holds at roughly: research $0.02 · plan+scratch $0.02 · checklist $0.005 · write $0.05 · link $0.005 · review (rubric + 8 critics in parallel) $0.05 · revise (~0.27× of pages) $0.03. The closed loop is significantly cheaper than the v1 budget table above because the checklist + knockout selector reduce revision count and the critic panel runs in parallel.
9. What's still missing (the honest list)¶
| # | Gap | Effort | Why it matters |
|---|---|---|---|
| 1 | Critic-attribution → persona update. Failed-critic patterns surface in backlog.md, but the writer's persona YAML isn't yet auto-amended by the retrospective. |
Medium | Without this, the same critic flags can recur cycle after cycle even when the retro identified them. |
| 2 | Arc proposal phase. Supervisor should propose arc slates once curriculum coverage stabilises; retrospective already flags candidates but doesn't promote them. | Small | The user picks which arcs become real; matches the "2 active arcs at a time" canon. |
| 3 | Author pages. High-frequency cited researchers (Bengio, He, Vaswani, Pearl…) deserve author pages with their seminal works; retrospective flags this but it's marked moderate-risk and stays human-review. | Small | Compounds citation density and gives every concept page a place to deep-link to. |
| 4 | Visual sanity for math. No automated check that LaTeX renders. | Small | Some pages have unrendered \(...\) because of escaping. |
| 5 | Hybrid local/cloud mode. Critics could run locally (Gemma 3 4B) while writer/reviewer stay cloud — cuts cost ~30%. Local-mode has working scaffold but Gemma 4 MLX is broken upstream. | Medium | Cheap parallel critic fanout without burning OpenRouter budget on every dimension. |
10. The one-line summary¶
FAIRE is a closed-loop deep-agent system that writes a frontier-AI wiki under a fixed budget, in a single voice, citing only primary sources, and nudges every reader toward making something.
When you change this system, that sentence is the test.