v1.3.03 loop iterations

Literature Review

Literature reviews fail in two directions — five sources Googled in ten minutes (false confidence), or 200 papers with no synthesis (unusable). The Literature Review skill produces the middle: a thematic synthesis where sources support findings, evidence strength is graded honestly, and gaps are treated as findings in their own right.

What this skill does

A literature review isn't a bibliography with prose between the entries. It's a thematic argument supported by sources. The shift is small in words and large in usefulness: instead of "Smith (2022) found X. Jones (2023) found Y. Patel (2024) found Z." you get "On theme T, evidence is strong because Smith, Jones, and Patel converge via independent methods — with one caveat raised by Patel that the others don't address." Sources become evidence; themes become findings. That's where the review starts paying for itself.

The skill runs every source through the CRAP test — Currency, Reliability, Authority, Purpose — and tiers them. Tier 1 (peer-reviewed, authoritative) anchors the review. Tier 2 (credible but with limitations) supports. Tier 3 (blog posts, opinions) is used sparingly and labelled clearly. The output never lets a single anecdotal source carry a finding it can't bear, and never gives a vendor-funded report the same weight as a meta-analysis. Evidence strength tags — Strong, Moderate, Weak, Contested — sit next to every finding so the reader knows what to defend hard and what to caveat.

Disagreements get diagnosed, not just flagged. "Sources disagree" is useless. The skill names the likely cause — different definitions of the same term, different time periods, different sample populations, one source simply outdated. Gaps get their own section because identifying what isn't known is often more valuable than restating what is. A topic with five strong papers and a conspicuous blind spot is a research opportunity, not a finished question.

Protections built in: a thin-literature protocol for topics with fewer than five credible sources (no extrapolation, the gap is the finding), an anti-cherry-picking protocol when users want a review that confirms a foregone conclusion (the review presents all relevant evidence, even when uncomfortable), and a methodology critique pass on Tier 1 sources to flag small samples, missing controls, industry-funded research on the funder's own product, and statistical-significance findings with trivially small effect sizes. Every review ends with a "translating to action" section that names the strongest defensible claim, the weakest, and the open questions where judgment has to fill in for evidence — so the work doesn't stay academic.

When this triggers

·You're writing a report, dissertation, or briefing that needs to be grounded in evidence
·You want to know what the research actually says before forming a position
·You've collected sources and need them organised by theme, not by paper
·You suspect the evidence is more mixed than the popular consensus suggests
·You need a defensible review you can hand to a sceptical reader without embarrassment

Example

Trigger

User: 'Do four-day work weeks actually work? Pulling together a board briefing. 8 sources collected.'

Output

Research question: Does a four-day work week sustain or improve productivity and wellbeing in white-collar work? Scope: 8 sources reviewed (2 Tier 1 peer-reviewed trials, 4 Tier 2 industry reports, 2 Tier 3 op-eds — used sparingly) SUMMARY OF FINDINGS · Wellbeing gains: STRONG evidence · Short-term productivity: MODERATE evidence · Long-term productivity at scale: WEAK evidence (only one study runs past 18 months) · Effect varies by job type: STRONG but under-reported THEME 1 — Wellbeing improves consistently Consensus across 6 of 8 sources. Burnout markers drop, sleep improves, attrition declines in 6-12mo windows. Strongest evidence: 4 Day Week Global trial (peer-reviewed, n=2,900). THEME 2 — Productivity claims are noisier than reported Headline "no productivity loss" is supported in 4 sources but methodologies vary widely. Self-reported productivity ≠ measured output. Source C (industry report from a vendor) inflates the finding — discount accordingly. THEME 3 — Job-type effect under-discussed All sources that disaggregate find different results for individual contributors vs management/creative roles. This is conspicuously missing from the public conversation. AREAS OF DEBATE Whether the productivity finding holds beyond 18 months. Two sources extend that far; both show some regression to baseline. GAPS · No long-run study on a four-day week at scale (5+ years) · Customer-facing/shift work largely absent from the evidence · No source addresses mid-management role specifically TRANSLATING TO ACTION (board briefing) Strongest defensible claim: wellbeing improves; short-term productivity is preserved in trial conditions. Weakest claim: productivity at scale long-term. Recommendation: 6-month pilot in a measurable team, not company-wide commitment based on current evidence.

Get this skill + 5 more

Get the full Research Intelligence pillar (6 skills) or the complete library.

Get the full stack — $299

What you get

238-line SKILL.md, ready to drop into ~/.claude/skills/
Tested through 3 Karpathy-loop iterations (versions v1.0.0 → v1.3.0)
Triggers automatically when relevant — no command to remember
Lifetime updates as the skill is refined further