S01 · E04 — THE DESIGN AGENT16:9 · ANAMORPHIC · BLEACH
TheDesignAgent

The judgment layerfor agent-built UI.

JTBD-calibrated UX and visual evaluation for agent-built UI. Your agent calls /discover before the build, /ux and /visual after. Findings cite the job — not a generic rubric. 5 free credits at signup. No card.

VHS · TRACKING OKTIMECODE LOCKED · DOLBY SR
◢ 04THE GAP YOU KEEP HITTING

Your agent ships the wrong UI — the same three ways, every time.

Not a benchmark — the moves your agent reaches for when it doesn't know the job. You've fixed all three this week.

◢ 01 · DEFAULT

“It built another dashboard.

— when the user just needed to do the thing.

The job was do the thing, not watch the thing. Your agent didn't know the difference.

◢ 02 · DEFAULT

Tabs again.”

— even though the job is sequential.

High-stakes, sequential workflows split across status tabs. The default move when the agent doesn't know the flow.

◢ 03 · DEFAULT

“The shadcn defaults are showing.

— and the diff passed review.

The UI ignores who's using it, when, and why. It looks fine. It's wrong. The user can't get through it.

◢ 05RECEIPTS · BEFORE / AFTER

Here's the same agent, the same model — with us in the loop.

Identical prompt. Identical model. The only difference is the brief before the build and the scoring after. Drag the slider to see what the judgment layer caught.

PR Brief · #847
opened 14m ago · auto-drafted
Fix race in webhook delivery dispatch loop
branch
feat/webhook-race-fix → main
author
agent / claude-4-opus
diff
+127 / −73 · 6 files
Files · 6
by risk
src/webhooks/delivery.ts
payments
+47·23
src/webhooks/queue.ts
+28·14
src/utils/mutex.ts
shared
+18·12
src/webhooks/types.ts
+6·0
tests/webhooks/delivery.test.ts
+tests
+24·0
/CHANGELOG.md
+4·0
Diff shape
+127200 LOC touched−73
Risk score
6.4/ 10high
lowmedhighcrit
Replaces ad-hoc setTimeout retry with a two-lock dispatcher. Closes the duplicate-charge race seen in INC-2418. Touches payment-critical code in 1 of 6 files.
coverage
+2.1%
bench p99
−4 ms
deps
0 added
migrations
none
src/webhooks/delivery.ts · hunk 1 of 3· the line concern #1 points at
141 async dispatch(event: WebhookEvent) {
142+ await this.queueLock.acquire();
143+ await this.replayLock.acquire();← lock-order risk
144 try {
145 return await this.deliver(event);
146 } finally {
147+ this.replayLock.release();
148+ this.queueLock.release();
149 }
AI-flagged concerns · 3
self-reported by agent
01
Lock order can deadlock against the retry worker
New dispatch acquires queueLock → replayLock. The retry worker at queue.ts:88 acquires them in the opposite order. Concurrent fire is reachable from the hourly cron.
delivery.ts:143–149
02
Test only covers the single-dispatcher happy path
No test simulates two workers dispatching the same event in parallel — which is the case this fix is intended to address.
delivery.test.ts:14–42
03
WebhookRetryAdapter deleted; downstream not checked
In-repo grep finds no callers. External services (billing-cron, ops-dash) were not scanned by this agent.
queue.ts:201
Estimated read
3min
vs. budget
−1m under 2m goal
Verify before ship
0/3
Read mutex.ts diff — 8 lines
Replay one Stripe + one GitHub fixture locally
Confirm no external consumers of WebhookRetryAdapter
Agent record · last 5
4 / 5 shipped clean
#841
rate-limit headers
#839
idempotency keys
#836
first pass missed RLS
#832
flake in prod
#828
retry backoff
Before
Slot · after
PR review · with TheDesignAgent
Paste the built component into this slot.
After
◢ 05TONIGHT'S SEGMENTS · THREE TOOLS

One MCP. Three tools. Calibrated to the job — not a generic rubric.

/discover before the build, /ux after, /visual when the screenshot matters. Same agent. Same model. New behavior.

◢ SEG · 01
/discover

Reads your system first.

— a JTBD brief, before the agent draws a pixel —

  • JTBD — who's using it, what's at stake.
  • Heuristics — picked for this job.
  • Patterns — surface, density, rhythm.
  • Project model — tokens, components. Cached.
◢ SEG · 02
/ux

Scores against the brief.

— findings cite the job + the file:line —

  • Heuristic check — Nielsen, JTBD-shaped.
  • Cognitive load — density vs. the work.
  • Pattern fit — graded vs. Discover.
  • Findings — file:line + the fix.
◢ SEG · 03
/visual

What code review can't see.

— headless screenshot, scored vs. your tokens —

  • Brand fit — palette, type, hierarchy.
  • Density — rhythm vs. JTBD profile.
  • Contrast — AA at ship level.
  • Diff — vs. last passing run.
◢ 06WHO'S TUNED IN

Built for whoever owns the UI.

Four roles. Same gap. One install closes it — without changing your agent, your stack, or your taste.

◢ ROLE · 01
Builders

Ship right the first time.

— for the builder who's redone agent UI three times this week —

  • Same agent, same model — new behavior, no new tool.
  • Brief before pixels — score after. Findings cite file:line.
  • One install — every repo, every editor.
◢ ROLE · 02
Forward Deployed

Drop in any repo. Any stack.

— custom UI in customer code, every week —

  • No infra — one pointer file. Stdio MCP.
  • Reads tokens — ships in the customer's voice.
  • file:line + fix — agent applies before opening the PR.
◢ ROLE · 03
Designers

Your taste becomes the constraint.

— stop watching your system flatten into shadcn defaults —

  • Tokens enforced — palette, type, hierarchy at the pixel.
  • Pattern fit — graded against your brief, not a generic rubric.
  • The agent — has to pass through you.
◢ ROLE · 04
Product Managers

Out of the QA loop for agent slop.

— every build scored before it reaches you —

  • JTBD-calibrated — findings cite the actual job.
  • file:line + the fix — agent applies before the PR.
  • Reviewable — every run a diff. Nothing AI-slop lands.
◢ MIDROLL · DISPATCH FROM THE EDITOR
AI-generated UI looks AI-generated.
That's a bug — not a vibe.
m
The Manifesto · entry 01 of 05
read the full transmission with $ cat manifesto
◢ INTERSTITIAL · 02:14

Your agent will not become a designer.It can ship like one is in the loop.

◢ 12SUBSCRIPTION PACKAGES · BILLED MONTHLY · CANCEL ANYTIME

Five credits to start. A subscription that pays for itself.

Every plan drops credits in monthly and resets at the end of the cycle. Bonus credits roll over and never expire — even after cancellation. We don't charge for errors, and you pay half on degraded calls.

5free credits

Start free. No card.

Five credits drop in at signup — enough for 1 /discover + 1 /ux + 1 /visual with one credit to spare. Use them in any repo, any agent. When they run out, pick a channel below.

no credit cardinstall in 60scredits never expire
start free →
1credit · /discoverproject model + JTBD brief, cached after first call
2credits · /ux4-stage JTBD evaluation with file:line findings
3credits · /visualheadless screenshot + token-fit scoring
CH · 01Starter
$27/mo
30 credits / month · +3 bonus rolls over
  • ~15 /ux runs/mo
  • $0.90 / credit
$ choose starter →
CH · 03Studio
$129/mo
160 credits / month · +31 bonus rolls over
  • ~80 /ux runs/mo
  • $0.81 / credit
$ choose studio →
CH · 04Team
$249/mo
320 credits / month · +71 bonus rolls over
  • ~160 /ux runs/mo
  • $0.78 / credit
$ choose team →
  • $0 on pipeline errors
  • 50% off on degraded runs
  • Bonus credits never expire
  • Cancel anytime from your dashboard

Building an agent that buys its own usage? agent-pay link ↗