v1.3.03 loop iterations

Retrospective Runner

Most retros produce the same two findings: 'communication could be better' and 'we should plan more.' Both are true for every project ever and change nothing. The Retrospective Runner drills past the platitudes to root causes — and produces testable experiments, not resolutions.

What this skill does

The reason most retros produce platitudes is that they stop at the first answer. "Communication was bad" gets written on a sticky note, everyone nods, and the action item becomes "communicate better." That changes nothing because it doesn't address what was actually broken. This skill enforces the 5-Whys drill on every significant finding until you reach something structural — usually a process, template, tool, or norm — rather than a vague behavioural ask.

It starts with facts. Before opinions are allowed in the room, the skill builds a scorecard: planned vs actual on timeline, budget, scope, and the original success criteria. This prevents the retro from becoming a vibes-based reconstruction of what happened. With facts established, the 4L framework — Loved, Learned, Lacked, Longed For — gives structure to the discussion. Then the most important step: drill the top findings until you have a root cause that's actionable, not a symptom that's just observable.

Then the rule that separates real retros from ritual ones: experiments, not resolutions. Resolutions are forever and never reviewed. Experiments have a stop condition, a duration, a measure, and an owner. "We'll communicate better" is rejected. "For the next two sprints, we'll post a one-line decision log in #project-decisions after every meeting, owned by [name], and review at the next retro" is accepted. Maximum 3-5 experiments per retro — more than that and none get implemented.

Special protocols handle the failure modes. If the same issue appears in 3+ consecutive retros, previous experiments have failed and the skill escalates the approach — usually to structural solutions (mandatory written decision logs, tool changes, process changes) rather than behavioural asks. Psychological safety is handled explicitly: anonymous pre-collection for sensitive topics, leader speaks last (so they don't anchor the room), and an explicit "thank the critic" norm because punishing honesty in one retro kills honesty in all future ones. Not every project needs a formal retro — a 2-day task gets a 5-minute journal entry, not a 60-minute meeting. The skill scales the format to the work that was done.

When this triggers

·A project just ended and you want to actually learn from it, not just write a feel-good summary
·The same problems keep showing up retro after retro and you suspect the previous 'actions' never landed
·You're running a solo retro and don't want it to become either self-congratulation or self-flagellation
·A project failed and you need to write the post-mortem without it becoming a blame session
·You want experiments with a stop-condition, not a list of resolutions that get filed and forgotten

Example

Trigger

User says: 'We just finished the [migration] project. Two months late, over budget, the team is exhausted. I need to run the retro on Friday.'

Output

Retro structure (60 min): 1. PROJECT SCORECARD (5 min, facts not opinions): Timeline: planned 12wk / actual 20wk · Budget: 1.4x · Scope: delivered core, cut 3 nice-to-haves · Original criteria: 4 of 5 met 2. 4L (25 min): Loved · Learned · Lacked · Longed for (anonymous pre-collection recommended — sensitive project) 3. 5-WHY DRILL on top 2 findings (15 min): e.g. "estimates were 60% off" → why? we estimated from feature list, not from spike data → why? no time budgeted for spikes → why? PM treated discovery as already done at kickoff → ROOT CAUSE: kickoff template doesn't include a discovery phase for unfamiliar tech. 4. EXPERIMENTS (15 min, max 3): EXPERIMENT 1: Add 1-week discovery spike to next project before committing dates. MEASURE: estimate accuracy on first milestone. OWNER: [name]. STOP IF: spike adds >2wk without changing estimate. 5. APPRECIATIONS (5 min) — specific, named. Psychological safety: leader speaks last. No attributions in 4L.

Get this skill + 15 more

Included in the The Agency Owner Stack — scale delivery without scaling headcount. Save $130+ vs buying individually.

Get The Agency Owner Stack — $149

Also in

The Startup Founder Stack · $149

What you get

189-line SKILL.md, ready to drop into ~/.claude/skills/
Tested through 3 Karpathy-loop iterations (versions v1.0.0 → v1.3.0)
Triggers automatically when relevant — no command to remember
Lifetime updates as the skill is refined further