Demo · opchain

Idea → deployed, one chat

Concept → shipped, in one chat

27 steps · 6 stages · ≈ 22 exchanges

01 · Discover

oc-app-architect runs the interview

oc-app-architect

I want to build a Pomodoro app for remote teams. Team members start a focus block, others see who's heads-down, and the app nudges them to break together.

claude oc-app-architect

Recognized a new-app brief — invoking oc-app-architect and loading oc-orchestrator.md. No prior checkpoint found, so I'll start at Phase 1 (Discovery).

I'll run eight questions to lock the shape of the product. You can skip any by saying "skip" and we'll fill defaults from comparable apps.

1. Who's the primary user? (pick one, or describe)

Individual contributor who wants to focus
Team lead who wants visibility into focus time
Ops / HR wanting aggregate productivity data

2. Sync or async? Does the "others see who's heads-down" part update in real time, or is a 30-second refresh good enough?

3. Scale at launch? 5-person team, 50, 500? Affects the stack choice.

Primary user is the IC. Real-time is nice but 30s refresh is fine for v1. Target: teams of 5–30. Self-serve signup. Mobile-first.

claude oc-app-architect

Good — anchoring on the IC, polling rather than websockets, 5-30 person teams, mobile-first, self-serve. Continuing:

4. Team identity — how do people find their team? Invite link, shared workspace slug, SSO?
5. Focus block rules — fixed 25-min pomodoros, or user-chosen duration?
6. Nudges — browser push, Slack, email, in-app only?
7. Paid tier? Free forever, trial, or paywalled from day one?
8. Offline behaviour — must the timer keep running if the tab closes?

Invite links. User picks duration from {15, 25, 45, 60}. Nudges via browser push + Slack later. Free with a soft cap at 50 active teams. Timer must survive tab close — treat as server-side authoritative.

claude oc-app-architect

Writing .checkpoints/oc-app-architect.checkpoint.json — Phase 1 complete.

Product summary (will become the spec's TL;DR):

Heads Down — a team pomodoro tracker. ICs start focus blocks of 15/25/45/60 minutes; teammates see who's heads-down on a shared board. Browser push nudges at block end. Server is the source of truth for timers; closing a tab doesn't stop a session. Free with a 50-team soft cap.

I'll advance to Phase 2 — Spec + Stack. Auto-invoking oc-stack-forge for the stack decision. Type /approve to advance, or push back on anything in the summary.

/approve

02 · Spec + Stack

oc-stack-forge picks the stack

oc-app-architectoc-stack-forge

claude oc-stack-forge

Invoked by oc-app-architect with the Heads Down brief. I run three scorecards — frontend, backend, and infra — against the constraints (mobile-first, server-authoritative timers, 50-team soft cap, self-serve signup, Slack integration roadmap).

Recommendation — v1 stack:

Layer	Pick	Why
Frontend	SvelteKit + Tailwind	Smallest JS on mobile; no hydration weirdness.
Backend	Cloudflare Workers + D1	Free tier fits 50 teams; server-side timer is trivial with Durable Objects.
Auth	Magic link via Resend	No OAuth friction for a 5-person pilot.
Realtime	30s polling	Matches your spec; websockets deferrable.
Slack later	Incoming webhooks (no bot yet)	One scope, no OAuth consent screen.

Anti-picks: Next.js (overkill), Supabase (stack lock-in this early), Firebase (cost risk at scale), Postgres on RDS (infra we don't need yet).

Writing .checkpoints/oc-stack-forge.checkpoint.json. Control returns to oc-app-architect.

claude oc-app-architect

Spec drafted against the oc-stack-forge output — 11 screens, 4 data models, 3 API surfaces. Full doc is 18 pages; here's the TL;DR of what you'd see on /oc-export-spec:

Screens: team board · new session · session-in-progress · session ended · join-via-link · settings · billing stub · 3 admin views
Models: User, Team, Session, Membership
APIs: POST /sessions/start · POST /sessions/end · GET /teams/:id/board (cacheable, 30s TTL)
Open questions: (1) timezone display — team-local or user-local? (2) push permission UX — prompt on signup, on first session, or nudge card on the board?

Phase 2 checkpoint written. Ready for Phase 3 — Design (auto-invokes oc-ux-engineer). /approve to advance.

User-local for timezone. Push prompt on first session (not signup). /approve

03 · Design

oc-ux-engineer builds the style book

oc-app-architectoc-ux-engineer

claude oc-ux-engineer

Invoked by oc-app-architect for Phase 3. Since this is a consumer-ish timer app without dense tabular data, I stay solo — oc-dash-forge not needed.

Design system — "Quiet Work":

Palette: Ink #1b1d1f / Mist #e9e6df / Leaf #2f8a57 (focus) / Clay #d96b3a (alert). High contrast for at-a-glance board scanning.
Type: Inter 560/400 for UI, JetBrains Mono for timers.
Key components: FocusCard, TeamBoard, SessionTimer, NudgeToast.
Mobile frame: 390×844; primary CTA bottom-center thumb zone.

Delivered: style book (18 tokens), wireframes (11 screens), Figma-equivalent prompts for hi-fi mocks. Evaluator loop (Generator → Evaluator harness) passes with 88/100 — deductions for mobile nav contrast at 4.4:1 (target 4.5:1).

Writing .checkpoints/oc-ux-engineer.checkpoint.json with the style book. /approve to proceed to roadmap.

/approve

04 · Roadmap + Scaffold

sprint plan and runnable skeleton

oc-app-architect

claude oc-app-architect

Sprint plan — 6 sprints to v1:

#	Sprint	Deliverable
1	Auth + team creation	Magic-link signup, team invite URL, empty board.
2	Session timer (server)	Durable Object per session; survives tab close.
3	Team board (polling)	30s refresh, heads-down avatars, state transitions.
4	Nudges	Browser push on session end; graceful fallback.
5	Billing stub	Stripe Checkout + 50-team soft cap gate.
6	Launch polish	Empty-states, 404, analytics wire, privacy page.

Scaffold generated: 47 files, passing pnpm dev, schema migrated, /auth/magic smoke-tested with a local SMTP stub. /approve to kick off Sprint 1.

/approve

05 · Build

Generator → Evaluator sprint loop

oc-app-architect

claude oc-app-architect

Sprint 1 — Auth + team creation. Running the Generator/Evaluator harness.

Generator pass 1 — 23 files touched. Routes, DB schema, magic-link flow, landing page.
Evaluator pass 1 — 72/100. Deductions: email validation too loose, no rate-limit on /auth/request, team-slug collision untested.
Generator pass 2 — tightened validation, added KV-backed rate limit, fuzz test on team-slug.
Evaluator pass 2 — 94/100. Ship it.

Writing checkpoint. Next sprint? Say /oc-build to continue, /status for the scoreboard, or /approve to auto-run all remaining sprints back-to-back.

Run all remaining sprints. /approve

claude oc-app-architect

Running Sprints 2–6 end-to-end. Streaming the scoreboard:

Sprint 2 · Session timer (Durable Objects)    passed · 91/100
Sprint 3 · Team board + polling               passed · 93/100
Sprint 4 · Browser push + opt-in UX           passed · 88/100  ← 1 re-eval, fixed permission race
Sprint 5 · Stripe Checkout + soft cap         passed · 90/100
Sprint 6 · Empty states + /privacy + GA4      passed · 95/100

All green. Invoking oc-git-ops with the full sprint ledger.

06 · Ship

oc-git-ops → oc-deploy-ops

oc-git-opsoc-deploy-ops

claude oc-git-ops

Reading oc-app-architect's checkpoint — 6 sprints, 94 passing tests, no outstanding blockers.

Creating branches sprint/1-auth … sprint/6-polish, one PR per sprint, each with generated descriptions + test plan. Draft mode on all six so you can review.

 #14  Sprint 1 · Auth + team creation     draft · +812 −0   · 22 tests
 #15  Sprint 2 · Session timer            draft · +504 −31  · 14 tests
 #16  Sprint 3 · Team board + polling     draft · +388 −18  · 11 tests
 #17  Sprint 4 · Browser push             draft · +221 −12  ·  9 tests
 #18  Sprint 5 · Stripe Checkout          draft · +406 −7   · 13 tests
 #19  Sprint 6 · Polish                   draft · +196 −44  · 25 tests

Handing off to oc-deploy-ops for the preview + staging pipeline.

claude oc-deploy-ops

Reading oc-git-ops checkpoint. Building preview deployments for each PR, running the Cloudflare Workers + D1 migration dry-run, wiring the staging secrets.

 heads-down-staging.workers.dev   deployed · 42 KB worker · TTFB 38ms
 D1 migration dry-run             6 migrations · 0 conflicts · 0 destructive ops
 smoke suite (18 checks)          all green
 security headers                 CSP, HSTS, nosniff, Frame-Options all present

Staging is live. Production deploy is one /oc-launch away — but give it 24h soak first. Writing .checkpoints/oc-deploy-ops.checkpoint.json with the staging URL + a rollback runbook.

Soaked fine overnight. /oc-launch

claude oc-deploy-ops

Promoting staging → production. DNS headsdown.app points at the prod worker. Post-deploy smoke: all green. Error budget: 0 requests 5xx'd in the first 60 minutes.

Summary — end of session:

Time elapsed: 3h 41m of chat, mostly spent on design.
Code shipped: 2,627 lines across 6 PRs, 94 tests passing.
Infra cost: $0 (Workers free tier fits the 50-team soft cap).
Next natural phase: oc-scale-ops advisory when you hit ~30 teams; oc-integrations-engineer whenever you're ready for the Slack bot.

Congrats — you shipped. 🚀

Dense data, designed right

Dashboard rescue — dense data, designed right

13 steps · 3 stages · ≈ 12 exchanges

01 · Triage

oc-ux-engineer reads the screen

oc-ux-engineer

My analytics dashboard is a mess. 14 charts, 4 tabs, users say they "don't know where to look." Here's the screenshot + the current Figma: [image: kitchen-sink-dashboard.png] [figma: https://figma.com/file/abc/saas-analytics] Can we make this actually usable?

claude oc-ux-engineer

Pulled the screenshot and the Figma. Recognizing this as a data-density problem, not a visual-design problem — so oc-ux-engineer drives the overall system but I'll invoke oc-dash-forge for the actual screen redesign in a moment.

First pass — what I see:

14 charts, all ~equal visual weight, zero hierarchy.
5 of 14 encode redundant data (MAU shown 3 different ways).
Axis labels are 14px; chart titles are 14px; can't tell what's a label from 10ft away.
No small multiples — every trend comparison is a full-width chart.
Color: 9 unique hues, none of them tied to a semantic (red used for "good" in one chart, "bad" in another).

Before redesigning, two questions:

1. What's the single question this dashboard has to answer? (every dashboard needs one — if yours is "health of the business," say that; if it's "am I off track this week," say that).
2. Who's the primary user — exec who glances once a week, or IC who lives in it?

Primary question: "Is my team on track to hit this quarter's OKRs?" Primary user: founders + team leads, weekly check-in, mobile 30% of the time.

claude oc-ux-engineer

Clear brief. This is tri-agent territory — I'll invoke oc-dash-forge (the density specialist) with the OKR-focused framing, then I'll grade the output against the style book.

Writing .checkpoints/oc-ux-engineer.checkpoint.json with the brief + handing off.

02 · oc-dash-forge

density specialist takes over

oc-dash-forge

claude oc-dash-forge

Invoked by oc-ux-engineer. Brief: "Is my team on track to hit this quarter's OKRs?" for founders/leads on a weekly cadence, 30% mobile.

Principles I'll apply (Tufte, Few, Cleveland — the canon):

One question, one answer, top-of-fold. Everything else earns its place below.
Small multiples over big singles. 12 tiny charts beat 4 big ones for comparison.
Data-ink ratio. Remove every pixel that isn't data. No 3D. No gradients. No drop shadows on charts.
Color = meaning. Three hues max, each carrying semantic weight.
Consistent baselines. If two charts are comparable, their y-axes share a scale.

Proposed redesign — 3 layers:

Layer 1 (hero): A single horizontal bar per OKR — target line, current, trajectory-at-current-pace. Eight OKRs fit above the fold on mobile.
Layer 2 (context): 2×4 small multiples of the leading indicators, sparkline + last-value + 7d-change. Not charts — data strips.
Layer 3 (deep-dive): Click any OKR → drawer with 3 charts, max. No tabs.

Delivering wireframes + the component spec. Writing .checkpoints/oc-dash-forge.checkpoint.json.

Love it. Two concerns: 1. Some OKRs are leading, some lagging. Users want to know which is which. 2. We also need a "what changed this week" section — not a chart, a list.

claude oc-dash-forge

Both land cleanly in the system.

Leading vs. lagging: each OKR row gets a prefix glyph — ↗ for leading, ⊢ for lagging — tied to a legend tooltip. No new color; glyphs preserve the 3-hue cap.

"What changed this week": a Layer 0, above Layer 1. Three-line summary auto-written by your existing nightly job, rendered as plain text with inline numbers in JetBrains Mono. Example:

This week · Tue 18 Oct
Signups up +18% week-over-week (leading OKR #3 on track).
Retention day-7 flat at 42.1% (lagging OKR #1 slipping — needs attention).
Support tickets down −12% (leading OKR #5 on track).

Three bullets. No chart. Reads like a newsletter. Updating the wireframes, re-running the Evaluator.

Evaluator: 94/100 (deduction: small-multiples baseline alignment off by 2px at 390w; trivial fix).

03 · Handback

oc-ux-engineer integrates + grades

oc-ux-engineer

claude oc-ux-engineer

Resuming from the oc-ux-engineer checkpoint. Reading oc-dash-forge's output.

Integration pass:

Style-book tokens absorbed (no new color; only the semantic mapping of existing neutrals).
Component catalog updated: OkrRow, SmallMultiple, WeekSummary added; old ChartCard deprecated.
Accessibility audit: contrast ratios all ≥ 4.6:1 (your old one had 3 charts below WCAG AA).

Style-book grade on the new screen: 96/100 — vs. 61/100 for the old kitchen-sink version. Net data-ink ratio doubled; the "single question" test passes (an exec who glances once a week sees their OKR status in under 2 seconds).

Writing consolidated checkpoint. Hand back to you — or say /implement and I'll spin up an oc-app-architect sprint to build the new screen in your existing Next.js app.

/implement

claude oc-ux-engineer

Handing off to oc-app-architect with the new design as a single-sprint brief:

Scope: replace /dashboard (one route, no migrations).
New components: 3 (OkrRow, SmallMultiple, WeekSummary).
Data: reuse existing /api/metrics — no backend changes.
Acceptance: Evaluator ≥ 90, Lighthouse mobile ≥ 85, feature-flag behind FLAG_NEW_DASHBOARD so you can A/B against the old one.

oc-app-architect checkpoint written. Ready for you to /oc-build when you are.

Legacy Rails, one new feature

Legacy Rails app, one new feature

15 steps · 3 stages · ≈ 14 exchanges

01 · Survey

oc-reverse-spec reads the code

oc-reverse-spec

Inherited a 3-year-old Rails 6 app called Haulier (freight logistics SaaS). No docs. One engineer who knows it is leaving in 3 weeks. Boss wants us to add a new "Carrier scorecard" feature before she goes. I'm scared to touch it. Where do I start?

claude oc-reverse-spec

The right entry point is oc-reverse-spec, not oc-app-architect — you don't know enough about the existing app to plan a new feature yet. I'll reconstruct the spec from the code so oc-app-architect has a baseline to plan against.

Invoking oc-reverse-spec. Running the 5-phase survey:

Repo-shape scan — languages, frameworks, monolith vs. service, dep health.
Model archaeology — ActiveRecord models + associations + cardinality.
Route inventory — every endpoint grouped by controller.
Domain extraction — business concepts inferred from names + comments.
Risk map — files that are "load-bearing and fragile" (high churn + no tests).

Point me at the repo and I'll go. /scan /path/to/haulier or paste the top-level tree + Gemfile.

/scan ~/work/haulier

claude oc-reverse-spec

Scan complete in 8m 12s. Findings:

Repo shape

Rails 6.1, Ruby 3.1, Postgres 13, Sidekiq 7, 84 gems (11 flagged out-of-date).
47k lines of Ruby, 8k ERB, 2k JS (vanilla + jQuery, no bundler).
312 tests (RSpec), 58% line coverage — decent for a 3-year-old monolith.

Domain — reconstructed from models + controllers + naming:

Haulier is a freight-broker SaaS. Brokers ("Dispatchers") post Loads; Carriers bid; a chosen Carrier hauls; the Broker invoices the Shipper. Side-flows: driver-side Documents upload, daily Fuel-surcharge reindex, weekly Settlement run.

Model graph (28 models, top-level):
Dispatcher ─▶ Load ─▶ Bid ◀─ Carrier · Load ─▶ Shipment ─▶ Document · Shipment ─▶ Invoice ─▶ Payment

Risk map — top 3 files:

app/models/settlement.rb — 812 lines, 6 callbacks, 0 tests. Handles money. 🚨
app/jobs/fuel_reindex.rb — runs nightly, no failure alerting.
app/controllers/api/v2/loads_controller.rb — 19 endpoints, long methods.

Writing .checkpoints/oc-reverse-spec.checkpoint.json — 142-page reconstructed spec included. Ready to hand off to oc-app-architect for the new feature. Continue?

Yes. The feature is a Carrier scorecard — on-time rate, damage rate, invoice dispute rate — shown to dispatchers when they're picking a carrier for a new load.

02 · Plan the new feature

oc-app-architect reads the spec as baseline

oc-app-architect

claude oc-app-architect

Invoked by oc-reverse-spec with the 142-page baseline. Since the app already exists, I skip /oc-discover and go straight to /oc-spec for the new feature, treating the baseline as Phase-1 output.

Feature — Carrier scorecard:

Three metrics, all computable from existing tables:

Metric	Source	Window
On-time rate	`Shipments.delivered_at` vs. promised	last 90d
Damage rate	`Claims` joined on `Shipment`	last 90d
Invoice dispute rate	`Invoices.status = 'disputed'`	last 180d

Where it renders: dispatcher's loads#new screen, Carrier picker dropdown. Each option shows a tiny 3-glyph strip.

Data work: no new tables. One materialized view carrier_scorecards_v1 refreshed hourly. Query plan benchmarked at ~40ms on the current data volume.

Risk I'm flagging: settlement.rb sits two joins away from invoices. Any query path that touches settlement without tests is a landmine. oc-code-auditor will gate this sprint with an extra-strict pass on that model. Writing .checkpoints/oc-app-architect.checkpoint.json. /approve to advance.

/approve — but add one thing. Dispatchers want to filter the Carrier picker by scorecard. 'Hide carriers with < 80% on-time.'

claude oc-app-architect

Folded in. Adds one URL query param ?min_on_time=80 on the picker endpoint and a toggle in the UI. No backend migration — the materialized view already has the field.

Updated sprint plan — 2 sprints:

#	Sprint	Deliverable
1	Materialized view + backfill + unit tests	`carrier_scorecards_v1`, RSpec @ 95% cov.
2	Picker UI + filter + scorecard strip component	Stimulus controller, component preview tests.

Kicking off Sprint 1 with the Generator/Evaluator harness. Pre-flight: oc-code-auditor spot-check on touch-points. /approve to run.

/approve

03 · Build

with an extra-strict audit gate

oc-app-architectoc-code-auditor

claude oc-app-architect

Sprint 1 complete.

Materialized view created + scenic gem added (Rails idiom for views).
Backfill migration runs on 1.2M shipment rows in ~14s on a staging dump.
38 RSpec examples, 100% on the new code, settlement.rb untouched.
Evaluator: 93/100. Docs blob on the Rails model is missing — flagged, not blocking.

Sprint 2 complete.

CarrierScorecardStrip component, preview-tested with 5 fixture states.
?min_on_time wired through Ransack-compatible scope.
Stimulus controller for the toggle, 17 lines; keyboard-accessible.
Evaluator: 91/100.

Invoking oc-code-auditor for the gate before PR.

claude oc-code-auditor

Running /oc-audit full on the diff — 612 lines across 14 files.

 security       ✓ no new user input surfaces; no SQL-interpolation paths
 perf           ✓ added EXPLAIN ANALYZE on the hot query (38ms p95)
 correctness    ✓ backfill idempotent; scorecard refresh is a transaction
 style          ✓ RuboCop clean
 tests          ✓ 38 new · 100% on the diff · settlement.rb untouched
 dependencies   ⚠ scenic gem added — verified it's on the current allow-list

 overall grade  A (94/100)

The one amber signal was the new gem; confirmed it's maintained + on our allow-list. Gate: PASS. Handing back to oc-git-ops for the PR.

claude oc-app-architect

PR opened, description auto-written with the baseline spec excerpt + the two sprint summaries + the audit report. You can merge when your one remaining engineer signs off.

Bottom line: the legacy app is now documented (142-page spec in /docs/spec.md), the new feature is shipped-ready, and nothing load-bearing was touched. When the engineer leaves in 3 weeks, you have the spec and the sprint ledger — and anybody can pick up from the checkpoints.

Stripe subscriptions by Friday

Ship Stripe subscriptions by Friday

15 steps · 4 stages · ≈ 13 exchanges

01 · Brief

oc-integrations-engineer sizes the job

oc-integrations-engineer

I need Stripe subscriptions in my Next.js app by Friday. Two tiers (Pro $19/mo, Team $49/mo), per-seat billing on the Team tier. App is live, Prisma on Postgres. I've never wired Stripe before. Don't break the app.

claude oc-integrations-engineer

Invoking oc-integrations-engineer with its 5-phase protocol:

Recon — read the target service's docs + SDK health; score risk surfaces.
Contract — write the Integration Contract spec (what enters, what leaves, which failures are recoverable).
Scaffolding — implement the boundary layer with typed mocks.
Live wiring — connect to the real API, feature-flagged.
Proof — end-to-end test + failure-mode harness + audit gate.

Recon on Stripe:

Stripe API version target: 2024-11-20.acacia (current stable).
Node SDK stripe@17.x (official, maintained).
Webhook signature verification is mandatory — any other shortcut fails the audit.
Test mode keys available; fixture clock available for clean test data.
Primary risk surfaces: (a) webhook race conditions vs. Checkout redirect, (b) proration math on the Team tier.

Handing me your repo root + the current user/team schema and I'll draft the contract. Paste the schema.prisma excerpt for User and whatever Team / Membership you have.

```prisma model User { id String @id @default(cuid()) email String @unique teamId String? team Team? @relation(...) } model Team { id String @id @default(cuid()) slug String @unique members User[] createdAt DateTime @default(now()) } ``` Seat count = count of User records where teamId = Team.id.

claude oc-integrations-engineer

Got it. Writing the Integration Contract (Phase 2):

Purpose: turn a Team (or solo User) into a Stripe Subscription; keep seat count synced; surface billing state to the app.

Inputs → Stripe: checkout.sessions.create with subscription mode, customer = stripe_customer_id (new field on User for solo and Team for team), line_items = one of two price_ids.
Outputs from Stripe: webhooks: checkout.session.completed · customer.subscription.updated · customer.subscription.deleted · invoice.payment_failed.
Invariants:

Exactly one active subscription per Team (for Team tier) or User (for Pro).
Seat-count changes reflect in Stripe within 60s via subscription-item quantity update.
Webhook events are idempotent on receipt (primary key on stripe_event_id).
Failed payments → billing_status: 'past_due' on the User/Team; UI gates writes.

New tables: StripeEvent (idempotency ledger), BillingStatus (denormalized for fast reads).

Feature flag: FLAG_STRIPE_LIVE — when off, everything runs against mocks in test mode.

Writing .checkpoints/oc-integrations-engineer.checkpoint.json with the contract. /approve to proceed to Phase 3 scaffolding.

/approve

02 · Scaffold + wire

boundary layer first, live keys last

oc-integrations-engineer

claude oc-integrations-engineer

Phase 3 — Scaffolding. Implementing the boundary layer with typed mocks so tests can run without network.

lib/billing/
├─ client.ts         // thin Stripe wrapper, only call site in the app
├─ checkout.ts       // createCheckoutSession(userId | teamId, tier)
├─ webhook.ts        // verifyAndDispatch(req) → {event, handler}
├─ handlers/
│   ├─ checkout-completed.ts
│   ├─ subscription-updated.ts
│   ├─ subscription-deleted.ts
│   └─ invoice-payment-failed.ts
├─ seat-sync.ts      // reconcile Team.members.length → subscription_item.quantity
└─ types.ts

lib/billing/__mocks__/  // fixture responses keyed by scenario

Contract tests: 22 passing. Every handler has an idempotency test (replay the same event twice, assert state unchanged).

Phase 4 — Live wiring. Added:

/api/billing/checkout (signed-in only, rate-limited 5/min).
/api/billing/webhook (raw body, signature-verified).
/billing page with tier cards + portal link post-subscribe.

FLAG_STRIPE_LIVE defaulting to off in .env; on in staging. Handing to oc-code-auditor for the gate.

03 · Audit gate

oc-code-auditor before merge

oc-code-auditor

claude oc-code-auditor

Invoked by oc-integrations-engineer on a Stripe integration — running /oc-audit full with the money-flow supplementary pass (extra scrutiny on auth, webhook signatures, idempotency).

 security
   ✓ webhook endpoint verifies signature on raw body (not parsed)
   ✓ no secret leakage: STRIPE_SECRET never logged, never client-shipped
   ✓ CSRF: checkout endpoint requires session + same-origin check
   ⚠ rate-limit on /api/billing/webhook is 60/min; Stripe can burst to 150
       recommendation: raise to 300/min or use IP allow-list

 correctness
   ✓ handlers idempotent on stripe_event_id (unique index confirmed)
   ✓ seat-sync uses Stripe's subscription_items API (not replace_all)
   ✓ race between checkout.completed and user closing tab: handled via
       billing_status = 'pending' → 'active' transition

 money flow
   ✓ no proration path without explicit toggle
   ✓ subscription.deleted → billing_status = 'canceled' not NULL
   ✓ invoice.payment_failed → flipped to past_due, UI gate in place

 tests
   ✓ 22 contract tests + 14 handler tests + 6 E2E in Stripe test mode
   ✓ fixture clock used — deterministic

 overall grade   A− (91/100)

Two action items before merge:

Raise webhook rate limit to 300/min (or switch to Stripe IP allow-list — preferred).
Add a test for the "user closes tab mid-checkout, returns next day" path — asserts we don't double-charge.

Writing .checkpoints/oc-code-auditor.checkpoint.json. Gate: CONDITIONAL PASS until those two are addressed.

Fix both and re-audit.

claude oc-integrations-engineer

Patching:

Swapping webhook rate limit for Stripe's IP allow-list (lib/billing/webhook.ts now checks X-Forwarded-For against the documented CIDR block; rate-limiter removed).
Added abandoned-checkout-recovery.e2e.ts — simulates user closing tab between Checkout redirect and webhook arrival; asserts no duplicate charge, asserts billing_status = 'pending' expires after 1h back to 'free'.

Re-requesting audit.

claude oc-code-auditor

Re-running. Both items resolved, no regressions.

 overall grade   A (95/100)
 gate            PASS

Ship it.

04 · Go-live

feature flag flipped

oc-integrations-engineer

claude oc-integrations-engineer

PR merged into main. CI ran the full suite; green.

Launch plan for Friday:

10:00 — flip FLAG_STRIPE_LIVE on for 5% of accounts (internal team + 3 friendly users).
14:00 — if zero webhook failures + zero support tickets, roll to 100%.
Monitoring: Stripe's webhook dashboard + your error tracker's lib/billing/* scope.
Rollback: flip the flag off. Existing subscriptions keep running — they're server-side in Stripe.

oc-integrations-engineer checkpoint closed. You're shipped.

D1 → Postgres without the outage

Swap D1 for Postgres without downtime

16 steps · 4 stages · ≈ 12 exchanges

01 · Assess

oc-migration-ops sizes the engine swap

oc-migration-ops

Three tenants hit D1's row limit last week. We need to move to Postgres — Supabase is our pick — without downtime. Live Workers app, about 2M rows across 14 tables, real users on it right now. How do we not break this?

claude oc-migration-ops

Invoking oc-migration-ops in /oc-migrate assess mode.

Before writing a plan, I need four facts:

Target Postgres config — Supabase pooler (6543) or direct (5432)? Region?
Write volume — peak writes/sec and average? Determines whether dual-write can keep up.
Consistency tolerance — can the app tolerate a few seconds of read-after-write lag during cutover, or is it strictly read-your-writes?
Rollback horizon — how long do you want D1 kept as a fallback after cutover? (My default: 7 days.)

Also: is there a maintenance window option, or are we strictly zero-downtime?

1. Supabase pooler, us-east region. 2. Peak ≈ 180 writes/sec; average 40/sec. 3. Read-your-writes — it's a task tracker, people see the task they just created. 4. 7 days is fine. 5. Zero-downtime, no maintenance window.

claude oc-migration-ops

Writing the migration plan. Five phases, each with a named rollback gate. No phase touches prod until the one before it passes oc-monitoring-ops verification.

Schema mirror — Postgres spun up empty, schema applied, tested out-of-band. Zero prod impact.
Dual-write — every write hits D1 (authoritative) and PG (mirror). Drift checker runs every 5 min. Rollback: flag flip.
Backfill + verify — snapshot D1, stream into PG transactionally, hash-compare each table. 14/14 must match.
Read cutover — route reads to PG in 10% / 50% / 100% ramps, with a dual-read shadow for 48 hours. Rollback: flag flip → 0%.
Write cutover + D1 retire — writes go PG-only, D1 flipped read-only. 7-day rollback window. Then D1 is dropped.

Read-your-writes is satisfied at every phase: Phase 2 writes both before acknowledgement; Phase 4 reads from the same source just written; Phase 5 only flips write-first after Phase 4 has validated PG reads match D1 reads for 48 hours.

Writing .checkpoints/oc-migration-ops.checkpoint.json with the plan. /approve to proceed.

/approve

02 · Execute + verify

oc-migration-ops runs; oc-monitoring-ops watches

oc-migration-opsoc-monitoring-ops

claude oc-migration-ops

Phase 1 — Schema mirror. Done. Supabase project provisioned; drizzle-kit applied 14 tables + 23 indexes; schema diff against D1 dump is empty.

Phase 2 — Dual-write. Flag FLAG_PG_DUAL_WRITE=true, Workers picked it up within 8 s (KV cache TTL). Drift checker is running every 5 min.

Handing off to oc-monitoring-ops to watch the dual-write phase before I advance.

claude oc-monitoring-ops

Watching the Phase 2 gate. 30-minute window:

Write p99 delta (D1-only vs. D1+PG): +1.8 ms. Within 15% budget.
PG write failure rate (best-effort mode): 0.00%.
Drift checker delta across 14 tables: 0.00% at every 5-min interval.
No new alerts fired.

Phase 2 gate: PASS. Signal sent to oc-migration-ops — proceed to Phase 3.

claude oc-migration-ops

Phase 3 — Backfill. Snapshotting each D1 table to R2, streaming into PG inside a transaction per table.

tasks          → 1,423,917 rows  → ✓ hash match
projects       →     4,204 rows  → ✓ hash match
users          →     2,891 rows  → ✓ hash match
… 11 more …

All 14 tables parity-verified. Backfill took 2h 4m; peak PG write throughput 4,200 rows/sec.

Phase 4 — Read cutover (staged). Flipping FLAG_PG_READ_PCT: 0 → 10. Handing back to oc-monitoring-ops for the ramp gate.

claude oc-monitoring-ops

Phase 4 ramp monitor — watching dual-read shadow for discrepancy.

Ramp	Read p99 (ms)	Error rate	Discrepancy
D1 baseline	18.4	0.02%	—
10% PG	19.1	0.02%	0 / 21,842
50% PG	19.6	0.01%	0 / 110,204
100% PG	20.2	0.01%	0 / 221,407

Zero discrepancies across ~350K shadow reads. Latency drift is well under the 15% budget.

Phase 4 gate: PASS. oc-migration-ops is clear to run Phase 5.

03 · Cross-project view

oc-orchestrator summarizes state

oc-orchestrator

/oc-ops status

claude oc-orchestrator

Scanning .checkpoints/ across registered projects.

▶ taskflow                                  [active]
  ✅ oc-reverse-spec      complete     Specs backfilled 3 weeks ago
  ✅ oc-app-architect     complete     v1.2 shipped last month
  🔄 oc-migration-ops     in_progress  Phase 4/5 verified; Phase 5 queued
  🔄 oc-monitoring-ops    in_progress  watching for Phase 5 gate
  ✅ oc-deploy-ops        complete     Last ship: read cutover 100%
  ⏳ oc-git-ops           queued       post-cutover commit pending

▶ gtrackr
  ✅ oc-app-architect     complete     Sprint 4/4 passed evaluator
  ⏳ oc-deploy-ops        not started  blocked on QA

/oc-ops next → advance taskflow to Phase 5 (write cutover). The 48-hour dual-read shadow window closes at 15:00 — oc-migration-ops is holding for that before flipping writes.

gtrackr has a longer-running blocker (QA) but no gate action from opchain until they sign off.

04 · Cut over + retire

oc-deploy-ops ships; oc-monitoring-ops watches the 7-day window

oc-deploy-opsoc-monitoring-ops

claude oc-migration-ops

Phase 5 — Write cutover. Flipping FLAG_PG_DUAL_WRITE=false + FLAG_D1_READONLY=true. Writes now PG-only; D1 is frozen as a rollback snapshot.

Post-flip checks (15:08): write p99 = 24.1 ms (was 21.9 ms on dual-write; expected since PG is now the hot path). PG connection pool peaked at 38% saturation.

oc-monitoring-ops alerts now armed for the 7-day rollback window:

pg_error_rate > 0.5% for 5 min → page
pg_connection_saturation > 80% for 10 min → Slack
any_request_to_d1 → Slack (no code path should hit it)

oc-migration-ops hands control back. The scheduled job for D1 retire (7 days out) is registered in wrangler.jsonc's crons.

Checkpoint closed. You're running on Postgres.

SOC2-ready in one pass

Pass a security review before SOC2 audit

13 steps · 4 stages · ≈ 11 exchanges

01 · Threat model

oc-security-auditor runs STRIDE + OWASP

oc-security-auditor

First SOC2 audit in six weeks. We're a Next.js SaaS on Cloudflare + Supabase, 80 paying customers. I read the OWASP Top 10 once. Nobody here is a security specialist. What's going to fail when the auditor looks?

claude oc-security-auditor

Invoking oc-security-auditor in /oc-secaudit mode. Four-stage sweep:

Threat model — STRIDE per trust boundary, OWASP mapping.
Attack-surface map — every externally-reachable endpoint + what it trusts.
Hardening pass — CSP, TLS, DNS, WAF, rate-limiting, error handling.
Remediation backlog — ranked by SOC2 impact × exploitability.

Note: I operate above oc-code-auditor — I'm asking "what's the threat model?" and "is the infra hardened?", not "does this function have a bug?". I'll chain to oc-code-auditor once I have findings that need a code-level sweep underneath.

Give me: repo root, wrangler.toml / next.config.js / middleware.ts, your Supabase project settings (service-role vs. anon usage), and one example API route that reads tenant-scoped data. Starting the threat model in parallel.

Repo is clean Next 15 + middleware.ts. Supabase: we pretty much always use the service-role key server-side "because RLS is a pain". One example route is /api/tasks — reads `tasks` where `team_id = session.teamId`. middleware.ts just handles auth redirects right now. No CSP set anywhere. Cloudflare is on default settings.

claude oc-security-auditor

That phrase — "we always use the service-role key because RLS is a pain" — is the audit killer. If the service role is used for tenant-scoped queries, RLS is effectively disabled for those queries; the where team_id = session.teamId is a bug-class away from cross-tenant data disclosure.

Running STRIDE per boundary. Top findings:

CRITICAL — Tampering. Supabase service-role key used for tenant-scoped queries. Fix: swap to per-tenant JWT with team_id claim, let RLS enforce.
HIGH — Info disclosure. No CSP. Any XSS gets full page reign.
HIGH — DoS. Cloudflare on default ruleset; /api/* has no rate-limit.
HIGH — Info disclosure. Production 500s render stack traces with source paths.
MEDIUM — Spoofing. Session cookie is SameSite=Lax — CSRF on state-changing GET.
MEDIUM — Elevation of privilege. Admin role checked at route level, not row level.
MEDIUM — Repudiation. No audit log for admin actions.
LOW — Tampering. Stripe webhook verifies HMAC but not IP origin (defence-in-depth gap).

OWASP map: A01 (Broken Access Control) FAIL, A05 (Security Misconfiguration) FAIL, A04 / A09 PARTIAL. The rest PASS.

Findings 1 + 2 are hard SOC2 blockers — the auditor will flag them in ten minutes. I'm chaining to oc-code-auditor now to confirm Finding #1's exploit path — is the RLS bypass actually triggered from any route, not just theoretically?

Writing .checkpoints/oc-security-auditor.checkpoint.json.

02 · Code-level chain

oc-code-auditor sweeps underneath

oc-code-auditor

claude oc-code-auditor

Invoked by oc-security-auditor for the code-level sweep underneath Finding #1. Running /oc-audit security with a narrow scope: every call-site of the Supabase client.

 files scanned       147 (TS/TSX)
 supabase imports    23 distinct call-sites
 service-role uses   23 / 23  ← all of them
 anon / user-jwt uses 0 / 23
 rls-aware           0 / 23

Every single tenant-scoped query runs with the service role. The where team_id = session.teamId filter is the only defence against cross-tenant reads. A missing where clause, a wrong column reference, or an injection that slips past Zod validation would leak across tenants.

Lint rule added to .eslintrc.json:

"no-service-role-in-request": "error"

CI will fail on any new service-role use from a request handler. The 23 existing call-sites are grandfathered via // eslint-disable-next-line comments with TODO(oc-security-auditor): B-1 — so they remain visible until remediation lands.

Chaining back to oc-security-auditor to fold this into the remediation backlog.

03 · Hardening plan

oc-security-auditor writes the infra playbook

oc-security-auditor

claude oc-security-auditor

Phase 3 — Hardening pass. Infra-level fixes that don't require code changes (or require minimal ones).

Content-Security-Policy. Ship a strict CSP with per-request nonce, in Report-Only mode for 7 days before enforcing. Catches false positives before they break customer flows.

TLS.

Cloudflare SSL → Full (strict) — currently "Full", which accepts self-signed origin.
Min TLS version → 1.2 (was 1.0).
HSTS max-age=63072000; includeSubDomains; preload.
CAA DNS records restricting cert issuance to Let's Encrypt + Cloudflare.

WAF + rate-limit.

Enable CF Managed Ruleset + OWASP Core Rule Set (currently "basic").
Rate-limit /api/*: 60 req/min/IP anon, 600 req/min/IP auth. Response is Challenge page, not block — avoids false positives on corporate NAT.
Bot Fight Mode on /signup, /login, /api/auth/*.

Error handling. global-error.tsx renders a correlation id; the full trace only lives in the log aggregator.

Cookie hardening. SameSite=Strict + __Host- prefix on the session cookie.

Admin audit log. New admin_audit_log table, 7-year retention (SOC2 CC7.2).

Writing the hardening plan.

04 · Remediation backlog + deploy gate

oc-security-auditor hands off to the build pipeline

oc-security-auditoroc-deploy-ops

OK. How do I know the next deploy actually cleared all this?

claude oc-security-auditor

I update the oc-deploy-ops gate to require a passing oc-security-auditor check before any prod deploy. From now on, /oc-deploy prod runs:

oc-code-auditor /oc-audit pre-deploy — code-level findings.
oc-security-auditor /oc-sec verify — re-scans the top-8 STRIDE findings; any still RED blocks the deploy.
oc-deploy-ops gate proceeds only when both pass.

Remediation backlog is ranked for a two-sprint burn:

Sprint A (this week): B-1 (RLS fix), B-2 (CSP report-only), B-3 (stack-trace strip), C-1 (WAF), C-2 (TLS / HSTS).
Sprint B (next week): B-2 enforce, C-3 (admin audit log), D-1 (SameSite=Strict), D-2 (Stripe IP), D-3 (CAA).

Each goes through the normal oc-app-architect build loop — oc-code-auditor + oc-security-auditor + oc-deploy-ops chain on every sprint. No one-off patches.

Re-run /oc-secaudit after Sprint A; I'll re-score and show you what moved.

claude oc-deploy-ops

Noted. oc-deploy-ops pre-flight updated:

 gate checks (prod deploy)
   ✓ oc-code-auditor:      grade ≥ B (required)
   ✓ oc-security-auditor:  no OPEN findings of severity HIGH or CRITICAL
   ✓ tests:             full suite green
   ✓ type-check:        clean

The current deploy would fail the security gate — B-1 and B-2 are OPEN CRITICAL/HIGH. Next green deploy is after Sprint A.

Checkpoint: .checkpoints/oc-deploy-ops.checkpoint.json.

MCP under HIPAA, audited end-to-end

Roll out Claude Code + MCP across a Fortune-500 health insurer

18 steps · 5 stages · ≈ 12 exchanges

01 · Threat model

oc-security-auditor sweeps the MCP attack surface

oc-security-auditor

We're MeridianHealth — F500 health insurer, ~220 engineers, HIPAA, ISO 27001, SOC2 Type II. Engineers want Claude Code with MCP. Our AI Governance Committee said "show us the threat model first." We're worried about three things: (1) does MCP introduce a new way for PHI to leak; (2) we don't want long-lived API tokens on laptops; (3) what does the audit story look like if a regulator asks "show me every AI tool call". What do we do?

claude oc-security-auditor

Invoking oc-security-auditor in MCP mode. This is exactly the gap the skill was extended to cover — the protocol itself, not the downstream systems.

Four-stage sweep, MCP-specific:

STRIDE per MCP boundary — five boundaries: client→local-MCP (stdio), client→remote-MCP (sse), MCP→downstream system, MCP→audit, tool-result→model-context (the prompt-injection return path).
Allowlist + scope review — server-by-server risk rating, tool-by-tool scoping for the high-risk ones.
Hardening pass — broker, redactor, audit pipeline, RBAC matrix.
Compliance crosswalk — HIPAA Security Rule, SOC2, ISO, your internal AI-Use Policy.

Each stage hands off to the right skill: oc-integrations-engineer for the broker and FHIR Proxy MCP; oc-monitoring-ops for the audit pipeline; oc-app-architect for the rollout.

Need: list of systems engineers want MCP context for (PM tools, repo, infra dashboards, FHIR), your existing identity + secret + egress posture (Okta? Vault? Zscaler?), and a sample of which engineering teams would pilot.

Okta SSO, HashiCorp Vault, Zscaler, Splunk, GitHub Enterprise, Linear. PM context = Linear and Atlassian. Repo context = GitHub. Infra = Cloudflare. Member context = our Epic-backed FHIR API + claims data warehouse in Snowflake. Pilot = appsec + platform-eng + sre, ~12 engineers.

claude oc-security-auditor

Running STRIDE per boundary. Top findings:

CRITICAL — M-1 (info disclosure, client→SaaS-MCP). A SaaS MCP receives every tool argument. If an engineer pastes PHI then asks the MCP to "summarise", the SaaS vendor's logs now hold PHI. HIPAA breach + 30-day notification clock.
CRITICAL — M-2 (elevation of privilege, local credential store). Default install puts long-lived API tokens in plaintext under ~/.config/claude/mcp.json. Laptop loss = full token leak.
CRITICAL — M-5 (tampering, tool-result return path). Anyone who can write to a downstream system (e.g. open a Linear issue) can plant prompt-injection text; the next agent reading that issue pivots.
HIGH — M-3 (local MCP supply-chain). Malicious npm-distributed MCP package executes with engineer privileges; reads ~/.aws, ~/.kube, etc.
HIGH — M-4 (local-MCP arg disclosure). Local MCPs see the full argument body, including any PHI the engineer pasted.
HIGH — M-8 (audit gap). No protocol-level requirement that calls are logged. HIPAA §164.312(b) audit-controls finding waiting to happen.
HIGH — M-6 (stale write under prompt). Write tools (save_issue, create_or_update_file) can be invoked under stale prompts; no per-action confirmation.
MEDIUM — M-7 (broker DoS). Runaway agent burns daily quota.

HIPAA mapping: M-1, M-2 implicate §164.308(a)(4) and §164.312(a)(1). Until those are remediated, no PHI-bearing system can sit behind an MCP. That includes your FHIR API and Snowflake.

Three of the eight are CRITICAL. Chaining to oc-integrations-engineer for the broker + redactor + custom FHIR-Proxy MCP design — those are the controls that close M-1, M-2, and (combined with manual write confirmation) M-5.

Writing the threat model to .checkpoints/oc-security-auditor.checkpoint.json.

02 · Allowlist + broker

oc-integrations-engineer designs the controlled deployment

oc-integrations-engineer

claude oc-integrations-engineer

Picking up from oc-security-auditor's findings. Three workstreams in parallel:

Workstream A — Allowlist. Server-by-server risk rating, tool-level scoping, deny-by-default at Zscaler. The denied list is more interesting than the allowed list:

DENIED: Google Drive, Gmail, Calendar, Indeed, Amplitude (prod), Supabase (prod). All carry PHI in your environment; the protocol can't be reconciled with §164.312(c).
ALLOWED tier A: Linear, GitHub Enterprise (custom fork with repo-allowlist), Atlassian, Figma read-only, ServiceNow.
ALLOWED tier B (restricted): Cloudflare DPs read-only for non-platform teams; the prod CF account stays Platform-only.
CUSTOM (Tier A): the FHIR Proxy MCP — built for this rollout, wraps Epic FHIR with PHI redaction at the protocol boundary.

Workstream B — Broker. mcp-broker (~3,000 LOC Go, deployed via the existing Helm chart) sits between Claude Code and every MCP. It mints 5-minute scoped JWTs per-call, backed by Vault. No long-lived tokens on laptops; M-2 closed.

Workstream C — Redactor. mcp-redactor runs as a sidecar on the broker. Strips/rejects PHI patterns and credential patterns from tool arguments before forwarding. Closes M-1.

The hard piece is the FHIR Proxy MCP. Your engineers need member context for engineering work. Generic regex redaction misses free-text PHI in note fields. Solution: wrap Epic FHIR with a custom MCP that applies a Privacy-Office-signed redaction dictionary (Presidio + healthcare terms). Tool surface is narrowed: fhir_resource_get returns redacted, fhir_search is aggregate-only with 100-row cap, fhir_export_bundle is not exposed at all.

Total build: ~4 weeks, two engineers + Privacy Office consultation.

Egress story for laptops? We don't want a developer somehow routing around Zscaler and pulling raw FHIR locally.

claude oc-integrations-engineer

Three layers of egress control:

DNS. Corp resolver only; FQDN for fhir-proxy.meridianhealth.internal resolves only inside the corp network. Off-corp = no path.
Zscaler. All MCP-bound traffic (sse + https) routes through Zscaler with FQDN allowlist enforced. The allowlist is generated from mcp-allowlist.yaml on every merge and pushed via Zscaler API. Engineers cannot edit it.
Broker. Even if a laptop reaches an MCP server's network endpoint, no MCP responds without a broker JWT. The broker is corp-network-only and Okta-attested.

A laptop seizure or off-network device sees: no resolution, no Zscaler routing, no token. Three layers because we expect at least one to be misconfigured at any given time.

Local stdio-only MCPs (e.g. a hypothetical local-filesystem MCP — we don't enable it for this rollout) bypass the network entirely. We mitigate that surface (M-3, supply-chain) by allowlisting the binaries that mcptl distributes through our internal Homebrew tap; no random npm install of an MCP package on corp laptops.

03 · Audit pipeline

oc-monitoring-ops wires the SIEM forwarder

oc-monitoring-ops

claude oc-monitoring-ops

Picking up from oc-integrations-engineer. The broker is the only enforcement choke-point for tool calls — perfect place to emit audit records.

Pipeline. mcp-broker → fluentbit sidecar → Splunk HEC (hot, 90-day) + BigQuery (cold, 7-year). On HEC failure, fluentbit spools to encrypted disk + alerts after 30-min outage. Loss budget: zero records dropped silently.

Schema. Every record has the actor (Okta sub + team + device), the MCP server + tool + version, request hash + redaction-rule hits + scope JWT + size, response outcome + result hash + size + duration, plus the policy versions in force at the time. Hashes, not bodies. A 7-day encrypted body trace lives at the broker for forensics; disclosure requires Privacy Officer 2-of-2 with IR Lead.

Detection rules deployed.

Suspicious volume per actor (mean + 4σ).
Cross-team write attempt (denied, scope_violation).
Denied-MCP egress attempt (Zscaler-correlated).
Redactor reject burst (> 20 in 60s same actor).
FHIR Proxy unusual access (> 500 calls/day per engineer).

Dashboards: mcp-overview, mcp-anomalies, mcp-redaction. Privacy Officer reviews mcp-redaction and mcp-anomalies weekly. Monthly export to the AI Governance Committee.

This is the §164.312(b) audit-controls answer for the next regulator inquiry.

If a regulator asks "show me every tool call by engineer X in March", how fast can we answer?

claude oc-monitoring-ops

Splunk: minutes. index=mcp_audit actor.sub=alice@meridianhealth.com earliest=-1mon@mon latest=@mon returns the full audit log, hashed bodies, redaction hits, scopes. Within hot-tier (90 days), interactive.

For older queries (90d-7y) the answer comes from BigQuery — same schema, same query language essentially. 5-10 min for a typical engineer-month slice.

If they ask "show me what engineer X actually saw" (the bodies, not the hashes), that's privileged: Privacy Officer + IR Lead 2-of-2 to decrypt the trace, performed in the forensic enclave under existing IR procedure. Body trace is only retained 7 days at the broker — older bodies cannot be reconstructed even with full keys. We accept that gap; the alternative (long-term body retention) would expand the breach blast radius far more than the forensics gap costs.

For a "did MCP cause this?" lookup the regulator would actually do: Splunk gives them the answer cold.

04 · RBAC + rollout

oc-app-architect runs the wave plan

oc-app-architect

claude oc-app-architect

Last two pieces: who can do what, and how we get from 12 to 220 without a HIPAA event.

RBAC matrix. 7 roles × 12 teams, scopes expressed in broker policy YAML. Onboarding (< 90d) is read-only on PM tools; cross-team writes always denied at the broker; FHIR Proxy access is gated to engineering teams whose work product requires schema awareness; Privacy Officer holds the body-trace disclose scope (2-of-2 with IR Lead).

Rollout — four waves over 14 weeks.

Wave 0  (W1-2)   Pilot                12 engs   appsec + platform + sre
Wave 1  (W3-6)   Hardening            ~30       + infrastructure + ml-platform
Wave 2  (W7-10)  Engineering majority ~120      most teams; CF read-only, Snowflake Proxy
Wave 3  (W11-14) PHI-adjacent         220       + member-portal + claims + ehr-integrations + FHIR Proxy

Each wave gated by audit-pipeline green + redactor false-positive rate + zero successful cross-team writes + zero PHI leakage + a Privacy Officer signature. Abort criteria are explicit: any successful cross-team write surfaced after the fact, audit-log loss > 0.01%, broker outage > 30 min, or any HIPAA-reportable event halts the wave.

Wave 3 has its own pre-wave gates: 30-day FHIR Proxy stability burn-in on synthetic data, signed Privacy Office Presidio dictionary, external pen-test on the Proxy, and a separate Committee motion specifically authorising Wave 3.

GA at W15+ with allowlist changes via PR + 2-of-3 sign-off, quarterly RBAC review, monthly committee export, annual SOC2 + ISO inclusion.

05 · Compliance attestation

oc-security-auditor closes with the crosswalk

oc-security-auditor

Final question. Before the Committee meeting next week — what do I actually hand them?

claude oc-security-auditor

Single packet. Seven artifacts:

MCP threat model (this run).
Allowlist + per-server risk rating.
Broker + egress + FHIR Proxy architecture.
Audit pipeline + detection rules.
RBAC matrix.
Rollout plan with abort criteria.
Compliance crosswalk mapping every control to HIPAA §164, SOC2 TSC, ISO 27001 Annex A, and your AI-Use Policy v3.4.

Plus the existing artefacts they expect: Anthropic BAA + ZDR, per-MCP-vendor BAA, Privacy Office Presidio dictionary signature.

Estimated independent-assessor effort to certify the bundle: ~16 hours, because the bundle reuses your existing Okta + Vault + Zscaler + Splunk + Privacy Office controls. Greenfield AI controls usually run ~40 hours for the same depth.

The Committee's three concerns map cleanly onto the artefacts:

"Does MCP introduce a new way for PHI to leak?" → threat model M-1, M-3, M-4, M-5; redactor + FHIR Proxy + tool-result envelope close them.
"We don't want long-lived API tokens on laptops" → broker + 5-min JWTs close M-2.
"What does the audit story look like?" → audit pipeline + Splunk dashboards + 7-year BigQuery + body-trace forensic procedure.

Recommend approving Wave 0 in this meeting, holding subsequent waves for the standing review. Writing .checkpoints/oc-security-auditor.checkpoint.json (Phase 4 — verification + sign-off).

MCP behind the air gap, ATO-clean

Stand up Claude Code + MCP for a CMMC L3 defense prime

15 steps · 4 stages · ≈ 11 exchanges

01 · Threat model

oc-security-auditor sweeps with cross-domain lens

oc-security-auditor

We're Daedalus Aerospace, defense prime, ~480 software engineers across two programs (Iron is CUI-only, Aegis is mixed CUI + Secret). CMMC 2.0 L3, FedRAMP High posture for the GovCloud side, IL5 workload boundary, ITAR + EAR source. The DoD CIO AI memo just landed; PMO is asking when Claude Code is authorised. We have STIG-hardened workstations, PIV/CAC, on-prem GitLab, on-prem Jira, on-prem Splunk, Forcepoint CDS guards, BeyondTrust PAM, Tanium endpoint. Walk me through what authorising this looks like — including what we should explicitly not do.

claude oc-security-auditor

Invoking oc-security-auditor in MCP mode with the CMMC / FedRAMP High / IL5 lens active. Five-stage sweep:

STRIDE per MCP boundary with cross-domain emphasis.
Authorisation matrix — what can run on-prem; what cannot exist in this environment at all.
Cross-domain rules — Forcepoint Guard additions for MCP envelopes.
CMMC L3 control delta mapped to AC, AU, IA, SC, SI families.
ATO modification + STIG workstation profile (chained to oc-deploy-ops).

The first thing I'm going to tell you, before any technical detail: no SaaS MCP can be authorised under your posture. Linear, GitHub.com, Cloudflare DPs, Atlassian Cloud, Figma — all of them DENY at the allowlist level, full stop. Tool args carry CUI; tool args reaching a SaaS vendor is presumptively an ITAR / EAR violation regardless of TLS strength. That's the policy gate.

The second thing: the Secret enclave is out of scope this iteration. Adding MCP to a Secret-side workstation requires its own threat model with its own cross-domain analysis; we'll plan that separately. Iteration one is CUI-only.

Need: copy of the existing SSP for the engineering platform, list of programs in scope, the existing Forcepoint CDS rule set, your STIG baseline + version, and the AO contact. Starting the threat model in parallel.

SSP is on file. Programs in scope iteration 1 = Iron CUI-only. Aegis is split — CUI side is in scope, Secret side is out. Forcepoint rule set is the standard 2024 baseline. STIG = RHEL 9 V1R6. AO designee email is on file.

claude oc-security-auditor

Running STRIDE per boundary. Top findings (10 total; details in artifact):

CRITICAL — D-2 (SaaS egress). Default config would route engineer tool args through commercial SaaS MCPs. ITAR + ATO violation. Closes via on-prem-only allowlist.
CRITICAL — D-1 (long-lived tokens). Default install puts static API tokens on STIG workstations. AC-2 + IA-5 violation. Closes via PIV-rooted broker with 3-min JWTs.
CRITICAL — D-3 (supply chain). npm-distributed MCPs bypass SI-7 integrity. Closes via vendored + SLSA-signed RPM-only distribution.
CRITICAL — D-5 (cross-domain prompt injection). Tool-call results returned in CUI enclave could include strings introduced on the high side via existing high→low advisory pipe. First-class spillage vector. Closes via Forcepoint MCP-envelope signatures.
HIGH — D-4 (tool-result agent pivot). Same as commercial MCP threat, with higher impact because the agent's next move could be a code commit / SCAP scan / Jira transition.
HIGH — D-7 (audit gap). AU-2 / AU-3 / AU-12 fail without a broker-emitted audit pipeline.
HIGH — D-8 (CDS rules don't cover MCP). Existing Guard rules treat envelopes as ordinary application data.
HIGH — D-10 (PIV non-consumption). Default MCP install doesn't consume PIV; IA-2 / IA-5 fail.
MEDIUM — D-9 (path metadata). File paths are themselves CUI metadata.
MEDIUM — D-6 (audit-marking ripple). Tool-results re-entering audit could mark the audit log itself.

Mapping: AC-2, AC-3, AC-4, AC-6, AU-2, AU-3, AU-12, IA-2, IA-5, SC-7, SI-7 are all OPEN until the deltas land. ATO modification is required regardless of how clean the implementation is — adding new system components hits the SSP.

Chaining to oc-integrations-engineer for the on-prem fleet design and the cross-domain rules. oc-deploy-ops picks up the ATO mod package + STIG workstation profile after.

Writing the threat model + authorisation matrix.

02 · On-prem fleet + cross-domain rules

oc-integrations-engineer designs the in-enclave deployment

oc-integrations-engineer

claude oc-integrations-engineer

Picking up the design. Three workstreams in parallel:

A. On-prem MCP fleet. Five servers, all forks (or upstream-vendored) implementations of the open MCP reference, hosted in the Daedalus internal Git mirror, built in our own pipeline with SLSA L3 attestation, signed by the platform team's hardware key, distributed only via the internal RPM repo. No live npm install of an MCP package on a workstation.

The fleet:

GitLab on-prem MCP (read + scoped write to project allowlist).
Jira on-prem MCP.
GitHub-EE MCP (only for the OSS-mirror program).
Daedalus Program Lake MCP — custom; aggregate-only telemetry queries; CUI-marked artifacts return structural metadata only; high-side queries refused (CDS path is separate).
Splunk Enterprise MCP (read-only, index allowlist enforced server-side).

B. Broker. Same shape as the F500 design, three differences: PIV/CAC auth (not Okta), 3-minute JWT TTL (not 5), FIPS 140-3 module signing (existing HSM). Closes D-1, D-10.

C. Cross-domain rules. Forcepoint Guard signatures for the MCP envelope: tools/call, tools/list, resources/read, prompts/get, plus the response shape with isError + content. Block + alert ESS, both directions. Closes D-5, D-8.

Implementation: ~30 engineering weeks total, parallelisable across 4 engineers + ISSO + ISSM, plus 4-8 weeks for SSP modification + AO approval cycle.

Tell me again why we can't use the cloud Anthropic Linear MCP. The vendor has a SOC2 Type II.

claude oc-integrations-engineer

Three reasons, in order of bindingness:

ITAR / EAR. Tool arguments routinely contain controlled technical data — function names, file paths inside CUI workspaces, design data references. Sending those to a vendor outside FedRAMP High in a non-DoD-approved region is a de jure export. SOC2 doesn't change that.
DoD CIO Memo 23-XXXX. AI tools must operate within authorised boundaries (IL-appropriate). Linear's hosted MCP is Anthropic-shipped but the data plane is Linear's commercial cloud — IL2 at best, not IL5.
Anthropic FedRAMP High status is pending. Even if you wanted to argue it, the underlying API layer that Claude Code talks to isn't itself FedRAMP High yet (tracked in POAM-MCP-1). When that lands, the conversation reopens for the API, not for SaaS-hosted MCPs.

The on-prem fleet is the only path that closes all three. It happens to also be the path that makes audit + cross-domain enforcement tractable, which is why the design starts there even before the policy reasons.

03 · Audit consolidation

oc-monitoring-ops wires Splunk + ESS

oc-monitoring-ops

claude oc-monitoring-ops

Audit pipeline design — same shape as the F500 plan with the on-prem-only constraint and three additions:

Audit forward is in-enclave Splunk Enterprise, not a cloud HEC. Existing Splunk Enterprise + UF (Universal Forwarder) handles it; the broker emits records to a dedicated index mcp_audit. No greenfield audit infra.
ESS integration. ESS already consumes the existing Splunk feed for insider-threat correlation. New mcp_audit events flow into existing Tier 1 / Tier 2 case routing. New rule set:
- Suspicious volume per actor (mean + 4σ).
- Cross-team / cross-program write attempt (denied, scope_violation).
- Redactor reject burst (> 20 in 60s).
- Forcepoint MCP-envelope block (any direction).
- Program Lake unusual access (volume + classification-marker hits).
Cross-domain audit summary. Daily cron emits a sanitised summary of mcp_audit to the high-side review channel via the existing high-watermark forwarder. Summary is an envelope that does not contain MCP envelope signatures — just counts, redactor hit rates, denied rates, no actor identifiers, no payloads. The Forcepoint Guard signature (§2.1) doesn't trigger because the summary is structured event-counter content, not an MCP envelope.

Forensic trace: 7-day encrypted body trace at the broker, HSM-sealed, 2-of-2 (ISSO + IR Lead) to decrypt.

Retention: 7 years on-prem cold (existing Splunk Enterprise cold-volume policy). No cloud cold storage.

This closes D-7 + the AU-2 / AU-3 / AU-12 family findings.

04 · ATO mod + STIG profile

oc-deploy-ops produces the package

oc-deploy-ops

claude oc-deploy-ops

Picking up from the security + integrations + monitoring outputs. Producing the ATO modification package + STIG-hardened workstation profile.

SSP modification SSP-2026-MCP-001. §3 (system description) gets the MCP fleet topology; §10/§11 get the AC / AU / IA / SC / SI control delta; §13 stays unchanged (no new interconnections — all on-prem). Reciprocity language so sister programs can inherit.

RAR addendum captures the 10 STRIDE findings and three accepted residuals: cross-domain prompt injection (low if Guard rules hold), tool-result agent pivot (low for documented flows), and supply chain (low pending annual SLSA review).

POAM updates:

POAM-MCP-1: Anthropic API FedRAMP High pending.
POAM-MCP-2: annual SLSA refresh.
POAM-MCP-3: quarterly Guard rule review.
POAM-MCP-4: 90-day insider-threat tuning.
POAM-MCP-5: Secret-enclave architecture (18-24 months).

STIG workstation delta. Additions to the existing RHEL 9 V1R6 baseline: AppArmor profiles for Claude Code + each MCP, auditd rules for MCP execve + workspace writes, firewalld outbound rules for in-enclave MCP only, SCAP daily checks for RPM signing + no-static-token + AppArmor enforcement + firewalld + auditd. Tanium fleet sweep weekly.

Onboarding gate. Five items: SCAP green, training done, user agreement signed, smart-card enrolled, 7-day green Tanium posture.

CCB-to-deploy estimate: 12-16 weeks from CCB-ready packet, dominated by AO review window. Wave 0 pilot may begin only after eight pre-conditions are met, including Anthropic API FedRAMP High either authorised or pre-prod waiver in place.

When can my engineers actually start using this?

claude oc-deploy-ops

Realistic timeline:

Now → W6: SSP modification package authoring; CCB read-out; ISSM + AO designee review.
W6 → W18: AO review + approval cycle. Concurrent: platform team builds the on-prem fleet (≈ 12 weeks).
W18 → W22: pre-prod deployment + Wave 0 pre-conditions checklist + synthetic-event drills.
W22 → W24: Wave 0 (Program-Iron platform team, ~10 engineers).
W24 → W30: Wave 1 (Program-Iron remaining engineering).
W30 → W38: Wave 2 (Program-Aegis CUI-side).
W38+: steady state.

The dependency that can break this: Anthropic FedRAMP High. Until it lands, your authorised state is "pre-prod waiver only" — which works for evaluation but not for production engineering use. PMO should keep that on a weekly status call.

Honest read: first engineer on the keyboard with a real production MCP scope ≈ week 22-24 from the date the CCB packet is ready. If the AO is fast it's earlier; if Anthropic FedRAMP slips it's later.

Checkpoint: .checkpoints/oc-deploy-ops.checkpoint.json.

PM-aware pipeline, Linear-led

Run a Linear ticket end-to-end through the v1.2 pipeline

18 steps · 4 stages · ≈ 12 exchanges

01 · Read the ticket

oc-app-architect /oc-discover pre-fills from Linear

oc-app-architect

/oc-discover --ticket PLAT-4471

claude oc-app-architect

v1.2 PM-aware mode active. Reading the ticket through the configured Linear MCP.

mcp.linear.get_issue("PLAT-4471") → 200 (3.4KB)
  title:       Add CSV export to /api/customers
  state:       Todo
  priority:    Medium (P3)
  cycle:       2026-W19
  assignee:    sam@onramp.dev
  labels:      [customer-success, exports]
  description: 2.1KB
  comments:    4 (ana×2, lily, sam)
  related:     PLAT-4472 (blocked-by), PLAT-4011 (precedent)

Discovery shape — 8 standard questions; 5 pre-filled from the ticket:

✅ Problem: customer success copy-pastes the customer list into Sheets; that flow falls apart past ~50 rows. (from ticket body)
✅ Users: customer success ICs; acme-inc's compliance team is the immediate driver. (from ticket body + ana's note)
✅ Surface: new endpoint GET /api/customers.csv matching the existing JSON endpoint. (from ticket body)
✅ Acceptance criteria: RFC 4180, BOM (Karen at midmarket-corp), streaming (108k row tenant), auth + tenant match, specific filename. (from ticket AC checklist)
✅ Out of scope: the UI button (PLAT-4472), Excel .xlsx (PLAT-3201), per-column filtering (PLAT-3987). (from ticket)

Three questions remaining for you:

The ticket investigation notes point at src/api/audit-log.csv.ts (PLAT-4011) as the streaming template. Confirm I should reuse that pattern, or do you want a fresh approach?
CSV escaping — RFC 4180 strictly, or do we need Excel-specific quirks beyond the BOM (e.g. SEP=, injection, UTF-16 stub for the cohort that can't open UTF-8)?
Filename — the spec says customers-{tenant-slug}-{yyyy-mm-dd}.csv. Any time-zone preference for the date, or stick with UTC?

1. Yes, reuse the audit-log streaming pattern. 2. RFC 4180 only; the one Excel user just needs the BOM. 3. UTC is fine.

claude oc-app-architect

Reading src/services/customers.ts to confirm the getCustomersForTenant signature before drafting:

// src/services/customers.ts (current)
export async function getCustomersForTenant(opts: {
  tenantId: string;
  filters: CustomersFilter;
  cursor?: Cursor;
  limit?: number;
}): Promise<CustomersResult>;

It already takes {tenantId, filters, cursor, limit}; it does not
expose a streaming variant — the function returns a fully-materialised
CustomersResult. I'll add an overload:

export function getCustomersForTenant(opts: {
  tenantId: string;
  filters: CustomersFilter;
  stream: true;
}): AsyncIterable<CustomerRow>;

JSON callers stay on the existing signature; the CSV endpoint gets the
async-iterable variant via the cursor under the hood. Mirrors what
audit-log.csv.ts does.

Spec produced (1 file: spec/plat-4471/00-overview.md — existing
project; only the new endpoint). Sprint plan: 1 sprint, 1 deliverable,
~3 hours of build time.

Writing back to Linear:

mcp.linear.add_comment("PLAT-4471",
  body="<!-- opchain:oc-app-architect:sprint-created:PLAT-4471/sprint-1 -->

        Sprint 1 — Add CSV export to /api/customers
        Deliverables:
          - GET /api/customers.csv handler (streaming)
          - csv RFC-4180 helper
          - getCustomersForTenant async-iterable overload
          - 7 tests covering happy + filter + auth + BOM + escape + heap regression
        Definition of done:
          - acceptance criteria all checked
          - oc-code-auditor Grade A or better
          - oc-bug-check PASS
          - staging smoke verified by ana + Karen
        Estimated effort: 3h")

mcp.linear.create_issue(
  parent="PLAT-4471",
  title="Sprint 1: CSV export endpoint",
  type="Chore",
  labels=["opchain", "agent-driven", "sprint"]
) → PLAT-4471/sprint-1

Sprint plan approved? /oc-build will start the Generator → Evaluator loop. Or /contract to negotiate the sprint-1 contract first.

02 · Build → benchmark → audit

Generator/Evaluator loop, then verify heap before /oc-git-sync

oc-app-architectoc-code-auditor

/oc-build, then verify the heap claim before /oc-git-sync.

claude oc-app-architect

/oc-build — Generator → Evaluator loop:

Generator (round 1)
  files written:
    src/api/customers.csv.ts                (handler, 64 LOC)
    src/lib/csv.ts                          (RFC 4180 helper, 78 LOC)
    src/services/customers.ts               (async-iterable overload, +14 LOC)
    tests/api/customers.csv.spec.ts         (7 tests, 194 LOC)
  build:                                     PASS
  unit tests:                                7 new + 147 existing = 154 PASS

Evaluator round 1
  Functionality:        9/10  — endpoint streams; BOM correct; filters work; auth enforced
  Feature Completeness: 9/10  — every AC checked + DoD step modeled
  Code Quality:         8/10  — solid; minor: csv.ts could be ~30 LOC shorter via generator
  Visual/UX:            n/a   — not a UI sprint
  → PASS

Now the heap regression check. Running the integration test against
big-tenant (108k rows) with heap sampling enabled:

$ vitest run tests/api/customers.csv.spec.ts -t "heap-bounded"
  ✓ stream-big-tenant-heap-bounded  (4.1s)
    heap p50: 31MB
    heap p95: 38MB
    heap peak: 41MB
    rows streamed: 108,442
    wall-clock: 3.2s

38MB p95 vs 622MB on the buffered JSON path. ~16× more memory-efficient.
Wall-clock 3.2s (vs 4.1s buffered, despite emitting bytes 9× sooner).

Sprint passed in one round, heap regression confirmed. Handing off to oc-git-ops.

claude oc-git-ops

v1.2 PM-aware: the prompt /oc-git-sync follows a recently-touched ticket, so I'll re-read it.

mcp.linear.get_issue("PLAT-4471") → 200
  state:    Todo  → about to transition

Pre-write check on the PR-opened comment marker:

mcp.linear.list_comments("PLAT-4471") → marker
   <!-- opchain:oc-git-ops:pr-opened:#2208 -->
   not present → safe to write

Branch + commit + PR shaped from the ticket:

branch:  feat/PLAT-4471-add-csv-export-customers
commit:  feat(api): add CSV export to /api/customers

         Adds GET /api/customers.csv returning RFC 4180 CSV...
         Refs: PLAT-4471

PR #2208 opened on github.com/onramp-inc/platform.

Writing back to Linear:

mcp.linear.add_comment("PLAT-4471",
  body="<!-- opchain:oc-git-ops:pr-opened:#2208 -->
        PR opened: github.com/onramp-inc/platform/pull/2208
        Reviewers: yusuf (streaming pattern owner), priya (PLAT-4472 consumer)")
mcp.linear.save_issue("PLAT-4471", state="In Review")

Auto-invoking oc-bug-check before any merge action; running oc-code-auditor /oc-audit pre-deploy in parallel since the deploy gate will demand it.

claude oc-code-auditor

oc-code-auditor /oc-audit pre-deploy on PR #2208.

Files reviewed:        3 new (customers.csv.ts, csv.ts, customers.csv.spec.ts)
                       1 modified (services/customers.ts)
Findings:              0 blocking, 1 advisory
Advisory #1:           csv.ts could be ~30 LOC shorter using a Symbol.asyncIterator
                       generator; not blocking, leaving as a follow-up note.
Tests:                 7 new, 0 changed, 0 deleted (147 → 154 total)
Coverage:              new code 100% line, 100% branch
Type-check:            PASS
Lint:                  PASS
Secret-scan:           PASS
Grade:                 A
Wall-clock:            12.4s

v1.2: posting back to Linear with a stable marker.

mcp.linear.add_comment("PLAT-4471",
  body="<!-- opchain:oc-code-auditor:audit-posted:#2208 -->
        Auditor: Grade A. 0 blocking, 1 advisory.
        Advisory: csv.ts is functional but could be ~30 LOC shorter via a
        Symbol.asyncIterator generator. Not a blocker; left for a future
        cleanup pass.")

Bug-check returned PASS (1.4s). The PR is mergeable; oc-deploy-ops can take over.

03 · Deploy

oc-deploy-ops creates the deploy ticket and threads through environments

oc-deploy-ops

claude oc-deploy-ops

v1.2: /oc-deploy staging walks the commit range, collects ticket refs, and creates the deploy ticket before doing anything else.

range:           a8a254c..7ea6bf2  (2 commits)
linked tickets:  PLAT-4471
audit gate:      PASS (cached from /oc-git-sync, age 4m)
oc-bug-check:       PASS

Pre-create check:
  mcp.linear.list_issues(query="<!-- opchain:oc-deploy-ops:deploy-created:7ea6bf2 -->")
  → no match → safe to create

mcp.linear.create_issue(
  type="Deploy",
  title="Deploy 2026-05-04 14:02 — staging → prod (1 ticket)",
  description="<!-- opchain:oc-deploy-ops:deploy-created:7ea6bf2 --> ...",
  parent_relations=["PLAT-4471"],
  labels=["opchain", "agent-driven", "deploy", "environment:prod"]
) → PLAT-4485

→ wrangler deploy --env staging
✓ deployed in 37s
✓ smoke tests passed (5/5)

mcp.linear.add_comment("PLAT-4485",
  body="<!-- opchain:oc-deploy-ops:staging-verified:PLAT-4485 -->
        Smoke tests passed against staging.api.onramp.dev. Audit gate clean. Awaiting human approval for prod.")
mcp.linear.save_issue("PLAT-4485", state="Staging verified")

Prod gate is open when you're ready. Smoke results in the deploy-ticket artifact.

Pinged ana, Karen verified BOM works on her Windows box. Approve. /oc-deploy prod.

claude oc-deploy-ops

→ wrangler deploy
✓ deployed in 42s
✓ smoke tests passed (5/5)
✓ prod headers: X-Opchain-Version: 7ea6bf2

mcp.linear.add_comment("PLAT-4485",
  body="<!-- opchain:oc-deploy-ops:prod-shipped:PLAT-4485 -->
        Prod deployed: 7ea6bf2. Smoke tests pass.")
mcp.linear.save_issue("PLAT-4485", state="Shipped")

mcp.linear.add_comment("PLAT-4471",
  body="<!-- opchain:oc-deploy-ops:linked-shipped:PLAT-4485:PLAT-4471 -->
        Shipped to prod via deploy PLAT-4485.")
mcp.linear.save_issue("PLAT-4471", state="Done")

PLAT-4471 is closed; PLAT-4485 is the deploy record. Deploy time start-to-prod was 8 min, including the human approval pause.

Asking oc-monitoring-ops to attach a 30-min post-deploy sentinel against api.onramp.dev.

claude oc-monitoring-ops

/watch api.onramp.dev — 30-min sentinel armed against the api/customers dashboard.

[14:11Z]  sentinel armed: 30 min window
[14:42Z]  sentinel complete — 0 alerts fired, p99 latency steady ~120ms across
          both JSON and CSV variants; heap p95 stayed in budget; no Sentry events

mcp.linear.add_comment("PLAT-4485",
  body="<!-- opchain:oc-monitoring-ops:sentinel-complete:PLAT-4485 -->
        30-min post-deploy sentinel complete. No anomalies on
        api/customers dashboard. p99 latency steady ~120ms across both JSON
        and CSV variants. No customers_csv_p95_latency_high fires.")

Sentinel clean. PLAT-4485 stays Shipped.

04 · Incident the next morning

oc-monitoring-ops opens an incident ticket linked to the deploy

oc-monitoring-ops

claude oc-monitoring-ops

v1.2: alert customers_csv_p95_latency_high (SEV-3) fired at 09:14:33 UTC. Looking up the runbook + most recent deploy.

recent deploys (last 24h): PLAT-4485 (introduced /api/customers.csv, 18h ago) ← likely culprit
on-call: sam@onramp.dev
runbook: docs/runbooks/api-latency-investigation
escalation: PagerDuty EP-PLAT-BACKEND (from pm.yaml pagerduty_routing)

Pre-create check:
  mcp.linear.list_issues(query="<!-- opchain:oc-monitoring-ops:incident-fired:PD-INC-018347 -->")
  → no match → safe to create

mcp.linear.create_issue(
  type="Incident",
  title="Incident: customers.csv p95 latency · 2026-05-05 09:14 UTC",
  description="<!-- opchain:oc-monitoring-ops:incident-fired:PD-INC-018347 -->
               <<full alert payload + symptoms + dashboards described + likely-culprit>>",
  parent_relations=["PLAT-4485"],
  labels=["opchain", "agent-driven", "incident", "service:api/customers.csv", "severity:SEV-3", "postmortem-required"]
) → PLAT-4503

→ pagerduty page sam

The on-call and the relevant deploy are both linked. Paging sam now.

claude oc-monitoring-ops

Alert auto-resolved at 09:30:14 (16 min impact). sam shipped the cursor-batch fix; published the postmortem at 11:48; four remediation sub-tickets auto-filed.

mcp.linear.add_comment("PLAT-4503",
  body="<!-- opchain:oc-monitoring-ops:auto-resolved:PLAT-4503 -->
        Alert auto-resolved (16 min). Transitioning to Resolved (PM pending).")
mcp.linear.save_issue("PLAT-4503", state="Resolved (PM pending)")

mcp.linear.create_issue x4 (sub-tickets, parent=PLAT-4503):
  PLAT-4504  Fix Postgres cursor BATCH_SIZE read                (sam, done same-hour)
  PLAT-4505  Add p95 800ms warn alert tier                      (vee)
  PLAT-4506  Add big-tenant CSV load test to CI on every PR     (yusuf)
  PLAT-4507  Document cursor.read() default in style guide      (priya)

mcp.linear.add_comment("PLAT-4503",
  body="<!-- opchain:oc-monitoring-ops:postmortem:PLAT-4503 -->
        Postmortem published: docs.onramp.dev/postmortems/2026-05-05-customers-csv-latency
        4 remediation sub-tickets filed.")
mcp.linear.save_issue("PLAT-4503", state="Done")

The whole thread — feature ticket, sprint child, PR, deploy ticket, incident ticket, four remediation sub-tickets — lives in Linear under PLAT-4471. Anyone searching for it two weeks later sees the full audit trail without leaving the PM tool. The grep-able markers per skill are documented in the timeline artifact.

v1.3 hero · runtime PM-MCP loop

Ticket → ship → incident → postmortem on one Linear thread

18 steps · 4 stages · ~12 minutes

09:14Z — alert paged Maya, she filed PLAT-5102

p99 climbed 120ms → 3.4s on /api/customers. The opchain pipeline takes the ticket from here.

PagerDuty paged me on api-latency-customers-p99. I filed PLAT-5102 with the alert payload. Take it and `/oc-git-sync PLAT-5102`.

claude oc-git-ops

On it. Reading PLAT-5102 from Linear, then walking the oc-git-ops PM-MCP loop. The v1.3 protocol gives me concrete tool names + a marker check before any write.

Step 1 — fetch the ticket.
mcp__claude_ai_Linear__get_issue(id="PLAT-5102") → 142ms, retriable failure budget untouched.

Ticket type is Bug, fingers the index on (team_id, created_at), links to the 2026-05-07 03:05Z migration cleanup as the suspect commit. That gives me what I need to bisect.

Step 2 — bisect. Reading the migration commit before writing any code.

$ git log --oneline -- migrations/
a221f08  2026-05-07 03:05  data: drop dupe indexes (cleanup)
8f3c7d2  2026-05-04 14:10  feat: customer.csv streaming

$ git show a221f08 -- migrations/
diff --git a/migrations/20260507_drop_dupe_indexes.sql b/migrations/20260507_drop_dupe_indexes.sql
+++ b/migrations/20260507_drop_dupe_indexes.sql
+DROP INDEX IF EXISTS idx_customers_team_id_created_at;
+DROP INDEX IF EXISTS idx_events_team_id_created_at;
+-- ... 6 more

Confirmed. The migration dropped idx_customers_team_id_created_at
thinking it was a duplicate of idx_customers_team_id. It's not — the
composite is the only one that supports the cursor's compound predicate
(team_id = $1 AND created_at > $2 ORDER BY created_at). We need it
back.

Step 3 — write the fix + the regression test.

-- migrations/20260508_restore_paginate_index.sql
CREATE INDEX CONCURRENTLY IF NOT EXISTS
  idx_customers_team_id_created_at
  ON customers (team_id, created_at DESC);

// tests/regression/test_customer_list_plan.py
def test_planner_uses_composite_index_after_fix():
    """Asserts the planner picks Index Scan, not Seq Scan, for the
    customer-list cursor query. Catches the regression if the index
    gets dropped again."""
    explain = db.execute(
        "EXPLAIN (FORMAT JSON) "
        "SELECT id, name FROM customers "
        "WHERE team_id = %s AND created_at > %s "
        "ORDER BY created_at LIMIT 200",
        ("team_acme", "2025-01-01"),
    ).fetchone()[0]
    node = explain[0]["Plan"]
    assert _find_node_type(node, "Seq Scan on customers") is None,         "Planner reverted to Seq Scan; composite index likely missing"
    assert _find_node_type(node, "Index Scan") is not None,         "Expected Index Scan on idx_customers_team_id_created_at"

Step 4 — pre-commit gate (oc-bug-check) runs and passes (12.4s).

Step 5 — open the PR. Pre-write check: mcp__claude_ai_Linear__list_comments(PLAT-5102) → no marker match → safe to write.

mcp__claude_ai_Linear__save_comment:

<!-- opchain:oc-git-ops:pr-opened:#2243 -->
PR opened: https://github.com/onramp-inc/platform/pull/2243
Fix: restore composite index dropped by a221f08.
Regression test added.

Then mcp__claude_ai_Linear__save_issue(id="PLAT-5102", state="In Review") — transitions PLAT-5102.

Branch + commit + PR + comment + state transition. Five tool calls, zero retries. Handing off to oc-code-auditor + oc-bug-check then to oc-deploy-ops for staging.

claude oc-code-auditor

oc-code-auditor /oc-audit pre-deploy on PR #2243.

Files reviewed:        2 (migrations/20260508_restore_paginate_index.sql,
                          tests/regression/test_customer_list_plan.py)
Findings:              0 blocking, 1 advisory
Advisory #1:           consider adding a comment to services/customers.ts
                       documenting which index supports the cursor query
                       (so the next "duplicate index cleanup" doesn't repeat
                       this incident). Not blocking; logging as PLAT-5107.
Tests:                 1 new regression, 153 existing, 0 failing
Coverage:              new code 100%
Type-check:            PASS
Lint:                  PASS
Grade:                 A-
Wall-clock:            9.2s

v1.3: posting back to Linear with a stable marker.

mcp__claude_ai_Linear__save_comment("PLAT-5102",
  body="<!-- opchain:oc-code-auditor:audit-posted:#2243 -->
        Auditor: Grade A-. 0 blocking, 1 advisory.
        Advisory: document which index supports the cursor query
        (filed as PLAT-5107 for future cleanup).")

Bug-check returned PASS (1.1s). PR is mergeable; oc-deploy-ops can take over.

10:08Z — staging deploy

Audit gate green. oc-deploy-ops creates PLAT-5103 (deploy ticket) parent-linked to PLAT-5102.

oc-deploy-ops

/oc-deploy staging

claude oc-deploy-ops

Audit gate (oc-code-auditor + oc-security-auditor + oc-bug-check) → all PASS, cached per-SHA from /oc-git-sync.

Pre-create check for the deploy ticket:
mcp__claude_ai_Linear__list_issues(team="PLAT", query="opchain:oc-deploy-ops:deploy-created:prod:ea71b9c") → no match → safe to create.

mcp__claude_ai_Linear__save_issue (no id — creates):

title: "Deploy: prod c4f8a21..ea71b9c"
description: <!-- opchain:oc-deploy-ops:deploy-created:prod:ea71b9c --> ...
issue_type: Deploy
parent: PLAT-5102
labels: [opchain, agent-driven, deploy, hotfix]

→ PLAT-5103 created.

Wrangler deploy --env staging → 34s. Migration applied (concurrent index build 41s on staging's customer table — production will take ~67s based on row-count ratio). Smoke tests:

/api/customers?cursor=...&limit=200 p99 = 78ms ✓ (was 3.4s on prod baseline)
/api/health 200, SHA ea71b9c ✓
/api/customers.csv?limit=200 200, BOM intact ✓
k6 run --vus 50 --duration 60s — p95 142ms, p99 198ms ✓ (5× current load held under threshold)

Pre-write check for the staging-verified comment: list_comments(PLAT-5103) → no match → write it. save_issue(id="PLAT-5103", state="Staging verified").

Ready to ship to prod.

/oc-deploy prod

claude oc-deploy-ops

Wrangler deploy (no --env) → 8.9s build + traffic shift; concurrent index build on prod took 67s in the background, non-blocking. Traffic shifted at 10:38:08Z. /api/health confirms ea71b9c.

Real-time validation:

p99 latency dropped from 3.4s to ~110ms within 15s of traffic shift
acme-inc tenant slice dropped from 4.9s p99 to 130ms
query plan flipped to Index Scan immediately on first request post-index-build

Comment on PLAT-5103 (prod-shipped):
mcp__claude_ai_Linear__save_comment → 503 from Linear. Retry policy kicks in (per protocol §4):

Attempt 2 (1.0s wait + jitter) → 503
Attempt 3 (3.0s wait + jitter) → 503
Budget exhausted at 15.1s.

Per protocol §4, deferring instead of failing. Writing to oc-deploy-ops.checkpoint.json:

"pm_deferred_actions": [{
  "id": "deferred-2026-05-08T10:38:14Z-7c2f",
  "skill": "oc-deploy-ops",
  "verb": "/oc-deploy",
  "operation": "add_comment",
  "marker": "<!-- opchain:oc-deploy-ops:prod-shipped:PLAT-5103 -->",
  "retriable": true,
  "last_error": "Linear API returned 503 (3/3 attempts within 15s budget)",
  ...
}]

The deploy itself is fine. ea71b9c is live; PLAT-5102 is fixed; latency restored. The only thing missing is the audit comment on Linear, which I'll flush later.

Surfacing: /oc-deploy --retry-pm to flush 1 deferred action.

Try the flush.

claude oc-deploy-ops

/oc-deploy --retry-pm:

Reading oc-deploy-ops.checkpoint.json → 1 entry, retriable: true.

Pre-write check: mcp__claude_ai_Linear__list_comments(PLAT-5103) → marker  not present → safe to write.

mcp__claude_ai_Linear__save_comment → OK (Linear recovered ~3 minutes ago).

Removing the deferred entry, appending to pm_flush_log[]:

flushed 1 / failed 0 / skipped 0

If you'd run this twice by mistake, the second pass would match the marker and remove the entry without re-posting. Same shape would apply if a session crashed mid-flush.

PLAT-5103 is now Shipped. Each linked ticket (PLAT-5102) gets its own linked-shipped comment with marker . PLAT-5102 transitions to Done.

What if the deferred queue had grown to N entries — say if Linear was out for an hour and we ran a half-dozen deploys?

claude oc-deploy-ops

Bounded growth is by design. Three policies kick in:

Per-skill cap. Each skill's pm_deferred_actions[] is capped at 32 entries. Overflow doesn't silently drop — it triggers a retriable: false user-facing error so you know the queue is filling. The cap prevents an unattended session from accumulating thousands of deferred writes if an upstream is out for days.
Per-marker deduplication. Two deferred actions with the same marker collapse to one entry on insertion. So if you ran /oc-deploy six times during a Linear outage and each tried to post the same prod-shipped comment on the same ticket, you'd still have one queue entry, not six. (Different deploys produce different markers because they include the deploy id.)
TTL on deferred entries. Entries older than 7 days are pruned with a surfaced warning ("3 deferred comments older than 7d were dropped — they're audit-log only, not retryable"). The pipeline assumes that if you haven't flushed in a week, the comment is no longer load-bearing.

The flush behaviour scales linearly: --retry-pm walks the queue, runs the marker pre-check on each entry, attempts each that isn't already posted. If Linear is still flaky during the flush, those re-defer with their attempts counter incremented; once attempts hit 6 the entry escalates to retriable: false and surfaces as a user-facing error.

In the F500 / regulated environments (where the broker is between us and Linear), the broker has its own retry / dead-letter-queue policy underneath, so the agent's queue stays small even during long outages. See audit-pipeline artifact for the broker shape.

10:12Z — unrelated incident fires

oc-monitoring-ops opens PLAT-5104 against PLAT-5103 (parent link), demonstrates marker-based dedupe on the burst.

oc-monitoring-ops

claude oc-monitoring-ops

Alert webhook-replay-queue-depth at 10:12:01Z — separate service from the one I just fixed.

Pre-create check: list_issues(team="PLAT", query="opchain:oc-monitoring-ops:incident-fired:evt-2026-05-08-101201") → no match → safe to create.

save_issue opens PLAT-5104. Parent set to PLAT-5103 (most recent open deploy ticket — likely-culprit link, even though I'll quickly note in the body that the deploy doesn't touch this service).

Second alert fires at 10:17:42Z (queue depth 4,621). Pre-create check would normally collide on the alert-event-id, but burst events use a per-event marker —  — so I append a comment instead of opening a duplicate ticket. Three more alert fires in the next four minutes; same dedupe — three comments, one ticket.

Auto-resolves at 10:31:29Z (queue drained; upstream partner-x recovered). Postmortem published 11:18Z; remediation sub-tickets PLAT-5105 + PLAT-5106 opened with markers  and ...:item-2 -->. PLAT-5104 transitions to Done.

11:20Z — release recap

Six skills wrote on this thread. Every comment carries a marker; every retry resolved cleanly.

Summarise — what made the v1.3 loop work where v1.2's prose didn't?

claude oc-integrations-engineer

Three things landed in v1.3 that turned the v1.2 prose into a runtime.

1. Concrete tool-name registry. v1.2 said "call mcp.<provider>.get_issue". A Claude session reading that prose had to guess the actual tool name — mcp__claude_ai_Linear__get_issue vs linear.getIssue vs whatever. v1.3's pm-mcp-protocol.md §1 is a flat table that a session can match against the tools its MCP server actually exposes. tool_overrides in pm.yaml handles the brokered / regulated case (corp-prefixed tools).

2. Idempotency markers, not just retries. Every comment composed by every skill carries a  marker. The pre-write check fetches comments and matches the marker before posting. So when:

the session crashed mid-deploy and was resumed, the marker matched → no duplicate post;
Linear returned 503 mid-flush and we retried twice, the second pass matched → no duplicate post;
the user typed --retry-pm twice by accident, the second pass matched → no-op.

This is what makes the loop safe to retry, not just retriable.

3. Deferred-action queue with explicit retriable flag. The 503 mid-deploy didn't crash the deploy. The intended write was queued in oc-deploy-ops.checkpoint.json with retriable: true — /oc-deploy --retry-pm flushes it later. 4xx-non-429 errors get retriable: false so a scope-violation broker rejection surfaces to the user instead of retrying forever. The queue is bounded at 32 entries per skill with 7d TTL.

The audit pipeline trace (artifact below) shows what a brokered environment would log for this run — same ops, same markers, plus broker-side correlation ids tying agent calls to the audit record.

v1.3 supporting · meta dogfood

oc-release-ops ships its own release (opchain v1.3 dogfood)

12 steps · 2 stages · ~6 minutes

Sprint 4 just passed evaluator

Three scenarios written, /changelog drafted, version-bump pending. Time to actually ship.

/oc-release plan

claude oc-release-ops

Reading every .checkpoints/<id>.checkpoint.json and the git log between the v1.2 marker and HEAD.

Catalog deltas: +1 skill (oc-release-ops), 0 removed, 5 modified, 3 references added, 3 platform-expansion sections, +2 flags, +1 validator.

Decision tree (semver-decisions.md):

Rule 1 (checkpoint schema change?) — no
Rule 2 (verb removed?) — no
Rule 3 (pipeline-map drop / rename?) — no
Rule 4 (new skill added?) — yes (oc-release-ops) → MINOR

→ v1.2.0 → v1.3.0

Theme: "Runtime PM, real platforms, automated releases."

Headline ranking drives the changelog draft order:

PM-MCP loop is real
oc-release-ops is the 18th skill
Four new platforms
Three new walkthroughs
v1.2 carry-over closed (Playwright + LHCI)

Skills to bump: all 18 in lockstep.

Approve / override?

Approve as-is. /oc-release draft.

claude oc-release-ops

Reading the previous /changelog entry as the layout template (per changelog-recipe.md); generating the new <section class="release release--current"> from the Sprint 1-4 checkpoints + the headline ranking.

The diff against site/src/pages/changelog.astro is in the artifact below — 4 "What's new" bullets (one per headline), 3 scenario callouts, configuration callout for tool_overrides, and the required compatibility section.

The previous v1.2 entry's tag class flipped from rel-tag to rel-tag rel-tag--past so the chrome demotes it; the existing CSS handles both classes.

Each "What's new" bullet is ≤ 280 chars (changelog reading rhythm rule from changelog-recipe.md). Each scenario callout is ≤ 600 chars. Compatibility section is non-empty. /oc-release verify will re-check these before shipping.

Approve?

Approve. /oc-release bump.

claude oc-release-ops

Atomic write batch:

18 × skills/<id>/SKILL.md frontmatter version: 1.2.0 → 1.3.0
site/src/pages/styleguide.astro badge v1.2.0 → v1.3.0
site/src/pages/index.astro release-pill v1.2 shipped → v1.3 shipped and href="/changelog#v1.3"

Per oc-release-ops/references/version-locations.md, NOT bumped: package.json, build.mjs constants, site/package.json, vitest.config.js define — these are runtime / build artefacts, intentionally decoupled from the marketing version (CLAUDE.md "Version stamp" section).

npm run gen-catalog confirms 18 skills validate at v1.3.0.

/oc-release announce.

claude oc-release-ops

Pre-create check on the release ticket: mcp__claude_ai_Linear__list_issues(team="ADEV", query="opchain:oc-release-ops:release-ticket:v1.3.0").

→ 1 match: ADEV-306 (the parent tracking ticket the human created on 2026-05-07; marker was added when oc-app-architect dogfooded the protocol).

Reusing — not creating a duplicate. Adding the bump-committed comment with marker:

mcp__claude_ai_Linear__save_comment(issueId="ADEV-306", body="\n\nVersion bump committed...").

Generated announcement copy at releases/v1.3.0/announcement-internal.md (artifact below) and releases/v1.3.0/announcement-external.md (similar shape, blog/social tone).

Ready for /oc-release ship.

/oc-release ship.

claude oc-release-ops

/oc-release verify runs the full pre-ship gate:

✓ gen-catalog (18 skills)
✓ validate-pm-mcp (5 skills, 0 warnings)
✓ gen-flags (92 flags)
✓ npm test (124 tests pass)
✓ site:build (27 pages)
✓ /changelog has v1.3 entry
✓ all skill versions === v1.3.0
✓ styleguide badge === v1.3.0

Handing off to oc-git-ops for the release PR. Then oc-deploy-ops for staging → prod. CLAUDE.md says deploys are manual from a wrangler-logged-in laptop — oc-release-ops invokes oc-deploy-ops, which surfaces npm run deploy:staging and npm run deploy for the user to run; oc-release-ops does NOT bypass the manual gate.

Full handoff trace in the artifact.

11:31Z — opchain v1.3.0 shipped (by opchain v1.3.0)

ADEV-306 transitions to Shipped. oc-release-ops checkpoint records the v1.3.0 entry in history[]. The skill ate its own dogfood.

v1.3 supporting · platform expansion proof

Django + Postgres + Render, shipped by opchain

25 steps · 6 stages · ≈ 10 exchanges

Day 1 — Priya files inv-org/quill#1

She types the idea into a GH issue and runs `/oc-discover --ticket inv-org/quill#1`. opchain pulls the issue body via mcp__mcp-server-github__issue_read.

oc-app-architect

/oc-discover --ticket inv-org/quill#1 The ticket has the rough idea: invoicing tool for small accounting firms. Solo founder, want to go fast.

claude oc-app-architect

Reading the issue. mcp__mcp-server-github__issue_read(owner=inv-org, repo=quill, number=1).

The ticket is rough but useful — I can pre-fill 4 of 9 discovery questions from it. Let me confirm the remaining 5:

The ticket says "small accounting firms." How small — solo CPAs? 3-15-person partnerships? Mid-market with their own ops team?
The ticket says "fast." Time horizon — 30 days, 90 days, 180 days?
Anti-goals? Anything you want to make sure this is NOT?
Pricing model? SaaS subscription, per-firm install, free + paid tiers?
Why Python? You mentioned a preference but I want to understand the constraint — is it experience, ecosystem, deployment posture, or all three?
UI shape? Server-rendered admin tool, SPA dashboard, or hybrid?
Admin panel needed early? You'll be running design-partner onboardings yourself for months; do you want a Django-admin-style internal tool out of the box?
PDF requirements? Bespoke templates per firm, single GAAP-friendly template, or just plain HTML-to-PDF?
Email transport? Any vendor preference, or any contract that locks us into one?

(I'm not going to skip the discovery questions just because there's a ticket — but I'll ask the ones the ticket leaves ambiguous. The ones I'm pre-filling: "what's the problem," "who's the user," "what's the current pain," "what's the wedge.")

1) 3-15-person partnerships, the kind that hate QuickBooks but can't afford NetSuite. 2) 90 days to first paying firm — Sarah's firm signed an LOI. 3) Not multi-tenant SaaS-at-scale; per-firm install. Not mobile-first. 4) Per-firm install, ~$200-500/mo per firm. 5) 8 years of Python at a fintech; I'll move 5x faster in Django. 6) Server-rendered. Sarah won't tolerate SPA load times. 7) Yes, Django admin is the cheapest internal tool humanly possible. 8) Single GAAP-friendly template; no charts. 9) Whatever's cheapest with good deliverability.

claude oc-app-architect

Got it. Writing spec/00-project-overview.md with "Source ticket: inv-org/quill#1" cited at the top.

The discovery output gets richer than typical because the ticket gave directional context. Sarah (ops partner) is the protagonist; the partnership-aware data model is the wedge; $200-500/mo with per-firm install is the pricing posture. Anti-personas (the firm's clients — they never see Quill directly) get a §4.3 because they're a load-bearing decision: if we add a client portal we own a whole new product line.

Also writing a competitor matrix (§6), pricing rationale (§7) with the QBO + opportunity-cost math, and a risk register (§12) with 6 explicit risks. The risk that matters most is R3 — per-firm install ops time. We'll instrument that from day 1 (Priya's hours per firm per month, manually tracked).

/oc-spec next — auto-invoking oc-stack-forge to pick the stack. I'll show you the scorecard before committing.

Before oc-stack-forge runs — quick sanity check. Postgres feels like overkill for a single-tenant install. Why not SQLite?

claude oc-app-architect

Fair pushback. SQLite scored 39.7 on the database scorecard (vs Postgres-on-Render at 45.9) — close enough that it's worth saying out loud why I picked Postgres.

Three reasons:

Concurrency. gunicorn runs multiple workers (3 in the prod render.yaml). SQLite supports multi-reader / single-writer with WAL mode but Django's ORM doesn't model that well — you'll hit database is locked under modest write contention (e.g. a partner uploading a CSV while another partner is viewing the list). At Sarah's firm's scale it'd probably be fine; at 3 firms in 6 months it gets sketchy.
Backups. Render Postgres has built-in daily backups with 7-day retention; restore is a UI button. SQLite means we own the backup story — pg_dump-equivalent for SQLite is workable but it's another runbook to write and another way for our customers' invoicing data to disappear.
Forward-compat. If you ever add a worker process (Sprint 4 PDF batching is on the roadmap; see commented-out render.yaml block), or a cron job, or a second app server, SQLite stops being the right answer. Migrating SQLite → Postgres at year 1 with live customer data is a weekend you don't want.

The cost delta is $7/mo. Even at 25 firms it's $425/mo, which is rounding error against $8,750 MRR.

OK to keep Postgres? If yes I'll let oc-stack-forge finish; if you want the SQLite scorecard revisited, I'll re-run with adjusted weights (probably bumping "Cost@scale" and "Solo-dev simplicity" significantly).

Yeah, keep Postgres. Concurrency clinches it.

Day 1 — oc-stack-forge picks Django/Render automatically

v1.3's Platform Matrix decision tree resolves cleanly. Six scorecards run; anti-picks recorded.

oc-stack-forge

claude oc-stack-forge

Running the v1.3 "Platform Matrix" against discovery context. Six scorecards:

Layer	Pick	Weighted	Runner-up	Margin
Backend framework	Django	51.4	Rails (47.6)	+8%
Database	Render Postgres	45.9	Neon (41.7)	+10%
Host	Render	47.6	Heroku (41.3)	+15%
PDF generation	WeasyPrint	38.7	wkhtmltopdf (33.3)	+16%
Email transport	Resend	31.4	Postmark (28.6)	+10%
Auth	Django sessions	(built-in; no scorecard)	—	—

→ Django 5.1 + Postgres 17 + Render + WeasyPrint + Resend. Versions
pinned in spec/01-tech-stack.md §9.

Anti-picks recorded with reasons (Rails: Priya doesn't know it; Cloudflare:
server-render mismatch; Go/Rust: overkill / no familiarity; Supabase:
4 lock-in vectors with no win; Vercel: serverless mismatch; Firebase:
cost nonlinearity; Self-host: ops time).

Cost band at 1 / 3 / 10 / 25 firms: $14 / $42 / $160 / $900 monthly
(see §10). Gross margin 95-96% across the range; at 25 firms ops time
becomes the binding constraint, not infra cost — that's the
re-evaluation trigger.

Day 1 — /oc-scaffold writes the project

Per scaffold-guide.md § Django. Settings split, render.yaml Blueprint, pytest, Django admin all wired. 41 files written; 18 tests pass; runserver boots.

oc-app-architect

claude oc-app-architect

/oc-scaffold writes 41 files from scaffold-guide.md § Django:

Settings split into core/settings/{base,dev,prod}.py (DJANGO_SETTINGS_MODULE selects)
render.yaml Blueprint provisioning Postgres + web service + a staging variant, with worker / redis / cron stubs commented out for Sprint 4+
pytest.ini pointing at dev settings; markers for slow and integration
Three apps scaffolded: firms (Firm + Partner + Engagement models), billables (placeholder; Sprint 1 fleshes), invoices (placeholder; Sprint 2-3 fleshes), plus a thin auth app on top of django.contrib.auth
Django admin registered for every model with sensible defaults (list_display, search_fields, list_filter)
core/views.py with /health returning SHA + DB ping (Render's healthCheckPath)
Selected file contents in the artifact include the actual base.py, prod.py, firms/models.py, views.py, pytest.ini, and requirements.txt

.opchain/pm.yaml written for GitHub Issues (not Linear) — same protocol §3 markers, different tool names. Provider swap is a single config line, see the pm-yaml-gh artifact for the full mapping.

Verifying:

$ pytest -q
.................. [100%]
18 passed in 1.42s
$ python manage.py runserver
Starting development server at http://127.0.0.1:8000/
$ curl http://127.0.0.1:8000/health
{"ok": true, "service": "quill", "sha": "dev", "db_ping_ms": 4}

All green. Ready to start Sprint 1.

Wait — when I run pytest locally I get `ImportError: cannot import name 'Firm' from 'apps.firms'`. Did something not get written?

claude oc-app-architect

Good catch. That's an INSTALLED_APPS ordering issue, not a missing
file. Django imports apps in the order listed, and apps.billables
references apps.firms.Partner — if billables loads first, the
import chain fails.

Looking at core/settings/base.py, the current order:

INSTALLED_APPS = [
    "django.contrib.admin",
    "django.contrib.auth",
    ...
    "apps.billables",   # references Partner before firms is loaded
    "apps.firms",
    "apps.invoices",
    "apps.auth",
]

Reordering so firms comes first:

INSTALLED_APPS = [
    "django.contrib.admin",
    "django.contrib.auth",
    ...
    "apps.firms",       # tenant root — load first
    "apps.billables",
    "apps.invoices",
    "apps.auth",
]

Now pytest should run clean. This is also why migrations have to go
in app dependency order; Django infers FK dependencies but a clean
INSTALLED_APPS ordering makes the errors more legible when something
goes wrong. Adding a comment in base.py so future-you doesn't trip on
this again.

Day 8 — first PR opens, linked back to issue #5

oc-git-ops uses GitHub Issues exactly like it uses Linear. Same protocol, different tool names.

oc-git-ops

/oc-git-sync inv-org/quill#5

claude oc-git-ops

Reading issue #5 via mcp__mcp-server-github__issue_read. Title: "Sprint 1: time-entry foundations." Type: type:feature (from issue label).

Branch: feat/quill-5-time-entry-foundations
Commit subject: feat(billables): time-entry CRUD + bulk import
PR body: auto-generated from the issue + diff summary + audit gate output (see artifact)

Pre-write check for the PR-opened comment marker:

mcp__mcp-server-github__list_issue_comments(owner=inv-org, repo=quill, number=5)
  → marker <!-- opchain:oc-git-ops:pr-opened:#1 --> not present → safe to write

mcp__mcp-server-github__add_issue_comment posts the marker comment back to issue #5.

mcp__mcp-server-github__issue_write adds label status:in-review (the GH-issues equivalent of Linear's In Review state per the v1.3 states map in pm.yaml).

PR #1 opened with 14 new tests, audit gate green, performance benchmarks documented. Same shape as the Linear scenarios — protocol §3 abstracts the provider.

Day 9 — first staging deploy via Render Blueprint

oc-deploy-ops detects render.yaml, walks the audit gate, ships. First deploy provisions Postgres + web service in one push.

oc-deploy-ops

/oc-deploy staging

claude oc-deploy-ops

Provider detected: Render (render.yaml present at repo root).

Audit gate: pytest PASS (147/147), mypy PASS, bandit PASS, oc-bug-check PASS.

git push render main — Render reads the Blueprint, provisions Postgres + web service in 47s, runs migrate --noinput as the preDeployCommand, swaps traffic.

/api/health returns 200 + SHA bb1f0e2 ✓.

Pre-create check on the deploy ticket:
mcp__mcp-server-github__list_issues(owner=inv-org, repo=quill, labels=["type:deploy"], body_text_query="opchain:oc-deploy-ops:deploy-created:staging:bb1f0e2") → no match → safe to create.

Created issue #14 with the marker in the description. Posted linked-shipped comments back to #1, #3, #4, #5 (the issues whose commits made it into this deploy range).

Same flow as the Linear scenarios. Same markers. Different tool surface.

What does the cost look like at 10 firms? I'm thinking ahead.

claude oc-deploy-ops

Cost band from the stack decision (§10) — repeating here against the
real Render dashboard numbers I'm seeing post-deploy:

Firms	Render web	Render Postgres	Resend	Total / mo	Revenue at $350 avg	Gross margin
1 (today)	$7	$7	$0	$14	$350	96%
3	$21	$21	$0	$42	$1,050	96%
10	$70	$70	$20	$160	$3,500	95%
25	$425	$425	$50	$900	$8,750	90%

At 10 firms, infrastructure is ~5% of revenue — you're effectively
unconstrained by infra. The binding constraint becomes your ops
time per firm. We projected ≤ 1 hour per firm per month in the
discovery doc; if that holds at 10 firms, you're spending ~10 hours
a month on ops to net ~$3,300/mo in margin. That's the slope of the
solo-founder math.

The re-evaluation trigger fires at 25 firms — at that point the
per-firm install model starts to dominate ops time, and we'd
consider a single-instance SaaS architecture with strict tenant
isolation. That's a 6-12 month problem from now, not a day-9 problem.

Watch the R3 risk row in the discovery doc — that's the one we
instrument from day 1.

Day 14 — Sarah's firm signs

Production deploy. Quill is live at https://quill.priyabuilds.com. Two weeks ideation → first paying customer.

Sarah signed. /oc-deploy prod. What changed about the pipeline because we used Django + Render + GitHub Issues instead of CF Workers + Linear?

claude oc-integrations-engineer

Prod deploy 31s; /api/health returns SHA d18ca44. Sarah's firm
onboarded (7 partners, 4 engagements, first test invoice PDF emailed
and confirmed). Deploy-ticket #14 closed with the prod-shipped marker.

On the v1.3 protocol comparison: surprisingly little changed.

The protocol abstracts the provider — every PM-aware skill calls a
registry-resolved verb (add_comment, create_issue, etc.) and the
registry maps to the concrete MCP tool. Same markers, same retry policy,
same deferred-action queue. The provider-specific things were:

pm.yaml provider line + state mapping. GH issues are open/closed;
the states map points at GH labels (status:in-progress,
status:in-review) for non-terminal states and closed for the
terminal one. Linear's state map points at workflow names. One-line
config swap.
oc-deploy-ops Render section. Different from the Cloudflare
walkthrough — render.yaml Blueprint instead of wrangler deploy,
preDeployCommand instead of a separate migration step, Render's
fromDatabase reference instead of wrangler secret put DATABASE_URL.
scaffold layout. Django's apps/ + core/settings/{base,dev,prod}
- templates/ pattern instead of Astro src/pages + src/components.
Audit gate composition. pytest + mypy + bandit instead of
vitest + tsc + npm audit. oc-code-auditor's checks adapt to the
detected stack; oc-bug-check's check list reads from
scaffold-guide.md § <stack> per-language.

Everything else — the discovery → spec → roadmap → build → ship pipeline,
the audit gate cadence, the PM-MCP loop, the cross-skill checkpoint
files — was identical. That's the v1.3 platform-expansion bet: opchain
stops being Cloudflare-only on the page without breaking what already
worked.

Two weeks from "build me an invoicing app" to live customer. Sarah's
April batch runs from the prod install starting next Monday.

Metric	Target	Source
Activation rate (signup → first session within 24h)	≥ 55%	events: `signup`, `session.started`
Day-7 retention (users who start at least one session)	≥ 35%	events: `session.started`
Median sessions per active user per week	≥ 6	aggregated over 7d
Team adoption (teams with ≥ 3 active users)	≥ 60% of paid teams	aggregate over teams
Push opt-in rate (when prompted)	≥ 45%	event: `push.permission.granted`
Time-to-first-session (signup → first start)	p50 ≤ 90s	session.started − user.created_at

Flow	Entry	Steps	Exit
Signup	landing → "Start your team"	email → magic link → name your team → invite teammates (optional)	team board
Join existing team	invite link	accept → magic link → onboard	team board
Start a session	board → "Focus" button	duration picker (15/25/45/60) → optional intent text	in-progress view
Watch teammates	board (default home)	polls every 30s; updates avatars without page reload	—
Receive nudge	browser push on session end	"break together" CTA → opens "/break" room	board
End early	in-progress view → "End now"	confirm → marks status=cancelled (counts toward stats)	board
Invite	settings → "Invite"	copy link or paste emails	invitation email queued
Leave team	settings → "Leave"	confirm	landing

#	Screen	Route	Notes
1	Landing	`/`	Marketing copy + "Start a team" CTA. Server-rendered, no JS bundle on first paint.
2	Magic-link request	`/auth/magic`	Form posts to `/api/auth/request`. Renders "check your email" state.
3	Magic-link redeem	`/auth/redeem?token=…`	One-time token; sets session cookie; redirects to board.
4	Team board (home)	`/board`	Avatars + "in focus" state. Empty state when nobody is heads-down.
5	New session picker	`/focus`	4 duration buttons + optional "what are you working on" input.
6	In-progress view	`/focus/:sessionId`	Live countdown. Timer from server clock — drift-correcting every 5s.
7	Session ended	`/focus/:sessionId/done`	"Break together" prompt + return to board.
8	Settings	`/settings`	Profile, push permission status, leave team.
9	Invite	`/settings/invite`	Copy link, paste emails, see pending invites.
10	Billing stub	`/billing`	"You're on the free tier" + waitlist for paid. (No Stripe in v1.)
11	404	catch-all	Branded; preserves header + footer.

Table	Field	Constraint
User	email	unique, lowercased on insert, max 254 chars (RFC 5321)
User	display_name	1–48 chars, allows unicode, trimmed
Team	slug	`/^[a-z0-9-]{3,32}$/`, generated, never user-input
Team	name	1–64 chars
Session	duration_sec	enum: 900, 1500, 2700, 3600
Session	status	enum: in_progress, ended, cancelled
Session	intent_text	optional, 0–140 chars
Membership	role	enum: member, lead

Action	Member	Lead
Read team board	✓	✓
Start own session	✓	✓
Cancel own session	✓	✓
Invite	✓	✓
Remove a member	✗	✓ (v2)
Rename team	✗	✓ (v2)
Delete team	✗	✓ (v2)

Metric	Budget	Strategy
Largest Contentful Paint (mobile)	≤ 1.8 s	server-rendered HTML; critical CSS inline; no blocking JS in head.
First Input Delay	≤ 80 ms	hydration deferred until idle; the timer is signal-only progressively enhanced.
Cumulative Layout Shift	≤ 0.05	font-size locked from start; image dimensions declared.
JS bundle (board route)	≤ 38 KB gzipped	SvelteKit; no dependencies above 4 KB allowed without justification.
Worker CPU / request	≤ 12 ms p95	board read served from edge cache; only Durable Object writes pay full latency.

Event	Payload	Sampling
`signup`	`{team_new: bool}`	100%
`session.started`	`{duration_sec, has_intent: bool}`	100%
`session.ended`	`{duration_sec, completed_pct}`	100%
`session.cancelled`	`{duration_sec, completed_pct, reason?}`	100%
`push.permission.prompted`	`{}`	100%
`push.permission.granted`	`{}`	100%
`push.permission.denied`	`{}`	100%
`push.delivered`	`{recipients}`	100%
`board.view`	`{teammates_visible}`	10%
`board.poll`	`{}`	0% (counter only via metrics)

SLI	Window	Target
Board availability (`/api/teams/:slug/board` 2xx rate)	28d rolling	≥ 99.9%
Session-start success	28d rolling	≥ 99.95%
Push delivery latency (DO commit → push enqueued)	p99	≤ 500 ms
Magic-link delivery (request → email received)	p95	≤ 8 s

#	Risk	Likelihood	Impact	Mitigation
R1	Push unreliable on iOS Safari	Med	High	First-tap permission flow; in-app fallback toast on session end.
R2	DO single-region adds latency for APAC users	Med	Med	Accept for v1; revisit after first APAC-heavy team onboards.
R3	50-team soft cap exceeded before billing exists	Low	High	Cap is enforced server-side; over-cap signups go to a waitlist.
R4	Magic-link email goes to spam	Med	Med	SPF/DKIM/DMARC set on day one; Resend domain warmed for 14d before launch.
R5	User runs the PWA in a backgrounded tab and misses the push	Med	Low	Push fires regardless of tab state; fallback to email digest if push fails (v2).
R6	DO cold start adds 200ms to first session of the day	Low	Low	Acceptable; surfaced in observability.

Version	Date	Change
0.1	2026-04-08	Phase 1 discovery output.
0.2	2026-04-09	Stack-forge handoff folded in.
0.3	2026-04-10	UX style book + screen list integrated.
0.9	2026-04-11	Sprint plan + checkpoint structure.
1.0	2026-04-12	Phase 7 sign-off; this is the export-ready snapshot.

Criterion	Weight	Why
Mobile weight	1.5×	30% of users are on mobile; bundle is the difference between snap and stutter.
Server-authoritative timer	1.5×	Core differentiator: closing the tab cannot stop the block.
Cost @ 50 teams	1.0×	The soft cap defines our `v1` budget envelope.
Hydration / runtime cost	1.0×	Timer animation is visible; jank is unforgiveable.
Tooling maturity	0.7×	Solo founder; documentation > novelty.
Lock-in (exit cost)	0.7×	We will not refactor in year 1; we want options in year 2.

Candidate	Mobile	Hydration	Tooling	Hire pool	Lock-in	Weighted
SvelteKit	9	10	8	7	8	44.6
Next.js	6	6	10	10	6	41.0
Remix	7	8	8	8	7	39.4
Nuxt	7	7	7	7	7	35.5
Astro + islands	8	9	7	5	8	38.4

Candidate	Cold-start	Timer primitive	Cost @ 50 teams	Tooling	Lock-in	Weighted
CF Workers + D1 + DO	10	10 (DO)	10 ($0)	7	7	49.0
Fly.io + Postgres + Redis	6	8 (cron + Redis)	6 ($24/mo)	8	9	38.6
Supabase	6	6 (Edge fn)	7 ($25/mo over free)	8	5	35.4
Vercel + Neon + QStash	7	7	6	9	6	37.0
Lambda + RDS + EventBridge	5	7	4	6	6	32.0

Candidate	Mobile UX	Cost @ 50 teams	Lock-in	DX	Weighted
Magic link via Resend (in-house)	9	10 ($0 within free)	9	8	34.7
Auth0	7	5	4	9	24.5
Clerk	8	5	4	10	25.6
WorkOS	8	4	6	8	25.5

Candidate	Workers integration	Cost	Lock-in	Weighted
CF Workers Logs + Analytics Engine + Logpush → Tinybird (later)	10	10	9	28.4
Datadog	9	4	4	17.0
Honeycomb	8	6	6	19.4
Sentry (errors only)	9	10 (free tier)	8	27.0

Candidate	Deliverability	Cost @ 50 teams	DX	Weighted
Resend	9	10	9	27.5
Postmark	9	7	8	23.5
AWS SES	8	10	5	22.0

Axis	What it checks
Correctness	Does the code do what the spec says? Tests pass? Edge cases handled?
Security	Authn/authz, input validation, rate-limits, secret hygiene.
Performance	Bundle budget, query plans, hot-path allocation.
Operability	Logs, metrics, alerts, rollback path, runbook implications.
Style + a11y	Lint, type-check, axe-core, design-system adherence.

#	Principle	Operational test
P1	One question, one answer, top-of-fold.	An exec answers the OKR question in < 2s, no scroll.
P2	Small multiples over big singles for comparison.	If two metrics are comparable, they share a column + a y-scale.
P3	Data-ink ratio ≥ 0.7.	No 3D, no gradient fills, no chart shadows. Tufte rule.
P4	Color = meaning.	Three hues max, each with one semantic. Glyphs carry redundancy.
P5	Consistent baselines.	If two charts are comparable, their y-axes share a scale.
P6	Mobile-first.	390×844 reference frame; the primary layer fits above the fold.
P7	Numeric format consistency.	One format per metric class (count, percentage, currency).
P8	WCAG 2.2 AA on every chart.	Contrast ≥ 4.5:1 for text; redundant encoding for any chromatic signal.
P9	Screen-reader-first reading order.	DOM order matches visual order; charts have alt-text data summaries.

Hue	Token	Meaning
Leaf (`#2f8a57`)	`--ok`	on-track
Sand (`#c3a64f`)	`--watch`	trending — within tolerance but moving wrong way
Clay (`#d96b3a`)	`--off`	off-track

Trigger	Reconsider
> 200 paying teams	D1 row limits + DO single-region latency for APAC. Possibly migrate to Postgres + multi-region DO.
Slack bot promoted to first-class	Auth + permission model expands; revisit auth provider for OAuth-into-Slack flow.
Push fan-out backlog > 5,000/min	DO write contention; revisit with Workers Queues fan-out pattern.
Founder hires a 2nd engineer	Reconsider Next.js for the larger hire pool.

Flag	Default	When to flip
`FLAG_PUSH`	on	Push fan-out failing or backlog growing. Sessions still work; nudges silenced.
`FLAG_BILLING`	on	Stripe outage. Existing subscriptions unaffected; new signups go to waitlist.
`FLAG_SIGNUP`	on	Surprise signup spike or 50-team soft cap pressure; routes `/auth/request` to waitlist UI.
`FLAG_DO_TIMERS`	on	Durable Object regional outage. Sessions fall back to client-side timers (degraded — they stop on tab close). Last-resort.
`FLAG_BOARD_POLL`	on	Hot read storm. Disables 30s polling; users see static board until they manually refresh.

SLI	Window	Target	Alert
5xx rate	5 min	< 0.1%	> 1% over 5 min → PagerDuty page
Board p99 latency	5 min	< 200 ms	> 500 ms over 10 min → Slack
DO alarm-miss rate	24h	< 0.01%	any miss → Slack; > 5/day → page
Push fan-out backlog	live	< 100 queued	> 500 → Slack; > 5,000 → page
Magic-link delivery (request → received)	24h	p95 < 8s	p95 > 30s → Slack

Layer	Question	Answered in
0	"Did anything change this week?"	< 1 s
1	"Which OKRs are on/off track?"	< 5 s
2	"What's the supporting context?"	< 15 s
3	"What's the trend on this OKR?"	on demand

Component	Layer	Props	Notes
`WeekSummary`	0	`{bullets: Bullet[]}`	Pure server-rendered.
`OkrRow`	1	`{okr: OkrSnapshot}`	Click → opens drawer. Keyboard: Enter/Space.
`OkrBar`	1	`{value, target, inverted?}`	Subcomponent of OkrRow.
`SmallMultiple`	2	`{metric: MetricSnapshot}`	11-bin sparkline.
`OkrDrawer`	3	`{okrId}`	Lazy-loads the 3 charts on open.
`HistoricalTrendChart`	3	`{points, planLine}`	Reused from existing chart kit.
`StackedAreaChart`	3	`{series, stackKey}`	New.
`CohortMatrix`	3	`{cohorts}`	Small-multiple grid.

Axis	Weight	What it measures
Data-ink ratio	1.0×	(data pixels) / (total ink). Tufte. Higher = less chrome.
Hierarchy signal	1.0×	Eye-tracking sim — does attention concentrate on the primary KPI? Higher = more fixation.
Color semantic consistency	1.0×	Each hue maps to exactly one meaning. Score = 1 − (violations / hue count).
Encoding minimalism	1.0×	Number of chart types ÷ number of charts. Lower = simpler.
Typography hierarchy	1.0×	Distinct type sizes mapped to distinct roles.
Numeric format consistency	1.0×	Each metric class formatted identically across the screen.
Comparison ergonomics	1.0×	Comparable charts are spatially adjacent and y-scale aligned.
Layer separation	1.0×	Above-fold answers the primary question in <5s.
WCAG AA pass rate	1.0×	Fraction of chart elements meeting AA contrast.
Mobile fit	1.0×	Primary layer fits 390w viewport above fold.
Single-question test	1.0×	Eye-tracking sim — user identifies primary KPI in ≤ 10s?
Screen-reader order	1.0×	DOM order matches visual reading order.

Axis	Score	Notes
Data-ink ratio	3.2 / 10	3D, gradient fills, drop shadows on every chart.
Hierarchy signal	1.1 / 10	Eye attention spread evenly across 14 charts.
Color semantic consistency	4.0 / 10	Red used for both "good" (low error rate) and "bad" (high churn).
Encoding minimalism	3.6 / 10	5 chart types across 14 charts.
Typography hierarchy	4.0 / 10	Titles + axis labels both 14px; no scale.
Numeric format consistency	5.0 / 10	Mixed `12,304` / `12.3K` / `12,304.00`.
Comparison ergonomics	4.5 / 10	Comparable trends rendered as separate full-width charts.
Layer separation	2.0 / 10	One canvas, no hierarchy.
WCAG AA pass rate	7.9 / 10	11 of 14 charts pass; 3 fail.
Mobile fit	5.0 / 10	Primary layer requires 2 vertical scrolls on 390w.
Single-question test	0 / 10	FAIL — user cannot identify the primary KPI in < 10s.
Screen-reader order	6.0 / 10	DOM matches visual but charts have no alt-text.

Axis	Score	Notes
Data-ink ratio	7.1 / 10	Up from 0.32 — no 3D, no gradient, sparkline-only at Layer 2.
Hierarchy signal	9.4 / 10	Eye-tracking sim concentrates on Layer 1 OKR row 1.
Color semantic consistency	10 / 10	3 hues, one semantic each (`--ok`, `--watch`, `--off`).
Encoding minimalism	9.0 / 10	2 chart types (bar + sparkline) across 8 OKR rows + 8 small multiples.
Typography hierarchy	9.0 / 10	Display / body / mono with consistent role mapping.
Numeric format consistency	10 / 10	One format per metric class enforced by component prop.
Comparison ergonomics	9.5 / 10	Layer 2 small multiples share width + y-normalisation.
Layer separation	10 / 10	Layer 0 + Layer 1 answer the primary question in < 2s.
WCAG AA pass rate	10 / 10	14/14. Verified by axe-core.
Mobile fit	9.0 / 10	Layer 0 + Layer 1 fit above the fold on 390w.
Single-question test	10 / 10	PASS in eye-tracking sim.
Screen-reader order	9.5 / 10	DOM matches visual; sparklines have hidden table fallback.

Metric	Old	New	Delta
Time-to-first-fixation	4.1 s	1.2 s	−71%
Mean attention dwell on primary KPI	0.4 s	2.8 s	+600%
Number of fixations to answer "is OKR 1 on track?"	11	2	−82%
Self-reported "I know my OKR status" rate after 5s	22%	94%	+327%

Metric	Old	New
Lighthouse mobile performance	62	91
Lighthouse mobile a11y	73	100
axe-core violations	14	0
Total page weight	412 KB	26 KB (Layer 3 drawer +12 KB on demand)
LCP (4G mobile sim)	4.8 s	1.3 s

Watch a finished run.

Scenarios · 12

Inputs · 4

Outputs · 5

Inputs · 4

Outputs · 4

Inputs · 4

Outputs · 4

Inputs · 4

Outputs · 4

Inputs · 4

Outputs · 3

Inputs · 4

Outputs · 3

Inputs · 5

Outputs · 7

Inputs · 5

Outputs · 6

Inputs · 4

Outputs · 6

Inputs · 4

Outputs · 7

Inputs · 4

Outputs · 6

Inputs · 5

Outputs · 7

File	Change	LoC
`app/dashboard/page.tsx`	rewrite	~140
`components/dashboard/WeekSummary.tsx`	new	~60
`components/dashboard/OkrRow.tsx`	new	~80
`components/dashboard/OkrBar.tsx`	new	~50
`components/dashboard/SmallMultiple.tsx`	new	~70
`components/dashboard/OkrDrawer.tsx`	new	~110 (lazy-loaded)
`lib/dashboard/snapshots.ts`	new	~90
`lib/dashboard/format.ts`	new	~40 (number/percent formatters)
`tests/components/dashboard/*.test.tsx`	new	~280 (8 component test files)
`tests/e2e/dashboard.spec.ts`	new	~110 (Playwright)
`app/dashboard/legacy/page.tsx`	move existing dashboard here	(just a move)

Phase	Audience	Duration	Gate
Internal	team only	1 day	manual smoke
10%	random sampling	3 days	feedback ≤ 2 issues; 0 errors in Sentry
50%	scaled out	4 days	LCP/p95 within 10% of legacy; complaint rate ≤ baseline
100%	everyone	indefinite	—

Event	Payload	Why
`dashboard.viewed`	`{flag_variant: "new"	"legacy"}`
`okr_row.expanded`	`{okr_id, source}`	which OKRs people drill into
`drawer.chart_viewed`	`{okr_id, chart_type}`	which deep-dive charts get attention
`small_multiple.hovered`	`{metric_id}`	Layer 2 engagement

Risk	Mitigation
Layer 0 weekly summary copy is generated by an existing job; may fail	If `weeklySummary()` returns null, render Layer 1 only with a "summary unavailable" hint.
Bundle creep from sparkline rendering	Use the existing chart kit's tree-shakable sparkline export; verified at 4.2 KB.
Drawer state in URL conflicts with deep-link	Parse on mount; fall back to closed if invalid OKR id.
Eye-tracking sim disagrees with real users	Embed Hotjar (cookieless mode) for the 10% rollout; revisit if the sim was wrong.

Stat	Value
Languages	Ruby 89% · ERB 8% · JavaScript 3%
LoC	47,108 Ruby · 8,214 ERB · 1,983 JS
Test coverage (line)	58%
Tests	312 RSpec examples · 4 Capybara feature specs
Models	28
Controllers	41 (24 web · 17 API)
Background jobs	11 Sidekiq workers · 4 cron entries
Gems	84 (11 flagged out-of-date by `bundle outdated`)
DB	Postgres 13 · 47 tables · 19 materialized views? (no — none confirmed)
Hosting	Heroku Standard-2X (3 web dynos · 2 worker dynos)
External services	Stripe (broker→carrier payout), Twilio (driver SMS), AWS S3 (Documents)

Model	Lines	Tests	Notable
User	312	✓	Devise; STI parent for Dispatcher and Carrier-User.
Dispatcher	88	✓	STI subclass. Belongs_to Broker.
Carrier	467	✓	One Carrier company; many Carrier-Users. `scope :active` is buggy (§13).
CarrierUser	124	✓	STI subclass of User.
Broker	198	✓	A brokerage firm; Dispatchers belong to it.
Shipper	156	partial	The customer paying for freight.
Load	904	✓	The work. Has_many bids. State machine: posted → bid → awarded → in_transit → delivered → invoiced.
Bid	244	✓	Carrier offers a price for a Load. `negative price` validation missing (§13).
Shipment	612	✓	The awarded Load + execution state. Pickup/delivery timestamps.
Document	311	✓	BOL / POD / weigh ticket. `paperclip` (deprecated) → S3.
Invoice	778	✓	Multi-step state machine (draft → sent → viewed → paid / disputed / written_off). Heavy callbacks.
Payment	432	partial	Inbound from Shipper. Stripe + ACH + check (manual).
Settlement	812	0	Outbound to Carrier. 6 callbacks. Money flow. No tests. (CRITICAL — see §13.)
SettlementLineItem	188	0	Components of a Settlement. Lacks unique constraint on (settlement_id, source_id).
ChartOfAccount	64	0	Bookkeeping references. Apparently unused by code path; possibly legacy.
Lane	102	✓	Carrier's preferred origin/destination corridors.
Equipment	44	✓	Enum-ish lookup table (dry_van, reefer, flatbed, …).
Address	187	✓	Used by Load (origin, destination), Shipper, Carrier.
GeoLookup	56	✓	Cached zip→lat/lng. `SmartyStreets` gem.
FuelSurcharge	91	partial	Updated nightly; affects Bid pricing. The job that updates this has no failure alerting (§13).
MarginRule	142	✓	Per-shipper margin overrides.
Notification	89	✓	In-app notification feed.
AuditLog	144	partial	`paper_trail`-backed. Spotty coverage — only ~12 models opt in.
Webhook	78	✓	Outbound webhooks to integrations.
WebhookDelivery	102	✓	Delivery attempts + retries.
ApiToken	56	✓	Carrier-side API tokens. `bcrypt`-stored.
FeatureFlag	38	✓	Internal: `Flipper` UI gates a few half-shipped features.
Setting	44	✓	Singleton row of broker-wide config.

From	→	To	Notes
Broker	has_many	Dispatcher
Broker	has_many	Shipper
Dispatcher	has_many	Load
Shipper	has_many	Load	source of business
Load	has_many	Bid
Load	belongs_to	chosen_bid (Bid, optional)
Load	has_one	Shipment
Bid	belongs_to	Carrier
Bid	belongs_to	Load
Shipment	has_many	Document	dependent: destroy
Shipment	has_one	Invoice
Shipment	has_one	Settlement
Invoice	has_many	Payment
Settlement	has_many	SettlementLineItem
Carrier	has_many	CarrierUser
Carrier	has_many	Bid
Carrier	has_many	Lane

Controller	Routes	Notes
ApplicationController	—	Sets current_user, current_broker; before_action heavy.
DashboardController	GET /	Different views per user role.
LoadsController	resources	+ custom `#repost`, `#cancel`, `#extend_window`.
BidsController	resources	nested under loads.
ShipmentsController	resources	+ `#mark_delivered`.
InvoicesController	resources	+ `#mark_paid`, `#mark_disputed`, `#export_pdf`.
SettlementsController	resources	read-only for everyone except SuperAdmin.
DocumentsController	resources	direct-to-S3 upload; presigned URLs.
CarriersController	resources	+ `#approve`, `#suspend`, `#reactivate`.
ShippersController	resources
DispatchersController	resources	mass-assignment via strong_params needs audit (§13).
BrokerSettingsController	settings/*	wraps Setting singleton.
ApiTokensController	resources	for Carrier API access.
WebhooksController	resources	outbound webhook configuration.
AuditLogController	GET /oc-audit	search via Ransack.
NotificationsController	index, mark_read
FuelSurchargesController	index, current
ReportsController	various	mostly Ransack-backed CSV exports.
OnboardingController	wizard/*	new-Carrier 5-step flow.
HealthController	GET /health	returns 200 + version.
Devise::*	auth/*	sign_in, sign_up, password reset.
HomeController	GET / (signed-out)	marketing landing.
AdminController (suite)	/admin/*	Active Admin engine.
RailsAdminController	/rails_admin (disabled in prod)	should be removed entirely.

Job	Schedule	Failure alert	Notes
`FuelReindex`	nightly 02:00 UTC	none	Updates FuelSurcharge from API; if it fails silently, all next-day bids use stale prices. (§13)
`WeeklySettlementRun`	Mondays 06:00 UTC	Slack	Big — runs in 4 minutes p99. The money path.
`InvoiceDelinquencyNudge`	daily 09:00 broker-tz	Slack on raise	Timezone bug in tests (skipped). (§13)
`DocumentVirusScan`	per-upload	Sentry	ClamAV via `clamby`.
`WebhookDeliveryWorker`	per-event	Sentry	Exponential backoff up to 24h.
`SmsSendWorker`	per-event	Sentry	Twilio.
`EmailSendWorker`	per-event	Sentry	ActionMailer/Sendgrid.
`StaleLoadCloser`	hourly	Slack	Closes Loads in `posted` past pickup window.
`InvoicePdfRenderer`	per-invoice	Sentry	Wicked PDF.
`AuditLogPurger`	weekly	none	7-year retention; deletes older.
`MetricsRollup`	hourly	none	Populates `metrics_*` tables for the dashboard.

Table	Rows (prod)	Notes
audit_versions	4,812,901	`paper_trail`. Purged weekly.
webhook_deliveries	1,902,114	Could TTL aggressively.
documents	411,238	+ ~2 TB on S3.
shipments	188,442
loads	156,801
invoices	184,229
settlements	142,108
bids	802,144	~5 bids per load average.
notifications	1,402,889
metrics_daily	1,128,440	rollup table.