v1.3.03 loop iterations

Spreadsheet Whisperer

Most people paste a CSV into a chatbot and get a narration of what they just pasted. The Spreadsheet Whisperer runs a data health check, classifies the data type, profiles the columns, finds the segments-changes-anomalies-numbers that actually matter, and ends with three recommendations specific enough to act on.

What this skill does

People stare at spreadsheets and see rows. They paste data into an LLM and get a narration of what they just pasted. Neither is useful. The gap between "here's data" and "here's what this data means for your decisions" is what this skill fills, by running a five-step pipeline — health check, profile, SCAN, visualise, recommend — instead of summarising what's already on screen.

Data health comes first because bad data produces bad insights. Missing values get reported by column and percentage. Duplicates (exact and near-duplicate same-entity-different-formatting) get flagged. Format inconsistencies get standardised silently and reported — currency symbols stripped, dates ISO-normalised, common null markers (N/A, blank, -) unified. Outliers get flagged but not auto-fixed because removing them is a judgement call you should make, not the skill. Datasets under 20 rows switch to descriptive mode — no correlations, no distributions, no false aggregates from too little data.

The SCAN framework finds the signal. Segments — are there meaningful groups, top vs bottom, regions, time periods, product categories, and what differs between them. Changes — what's moving over time, growth rates, acceleration, inflection points. Anomalies — what doesn't fit the pattern, with each one flagged for investigation rather than smoothed away. Numbers that matter — the three to five KPIs someone would actually pin to a dashboard from this data, calculated and compared to benchmarks where context allows. The output prioritises what's surprising, not what's exhaustive.

Data type classification changes which analyses are valid. Time-series enables trend detection, seasonality decomposition, and forecasting. Cross-sectional enables comparison, segmentation, and ranking. Panel data enables both. The skill classifies the data first, runs the analyses the classification supports, and refuses to run the ones it doesn't (no forecasting from a single snapshot, no segment comparison when the data only has one entity). This catches the most common analysis mistake — applying the wrong technique to the wrong data shape.

You get a health check with auto-fixes named, a profile that highlights what's interesting (not every stat), the four-lens SCAN findings, two or three specific chart recommendations chosen to fit the data type rather than the generic line-chart-of-everything default, and three to five recommendations with the format Recommendation/Based on/Expected impact/Priority. The skill leads with the surprise, quantifies every claim, and ends with one question back — "to go deeper, I'd want to see [specific additional data]" — because the next iteration of an analysis usually matters more than the first. If the data is too small or noisy to draw conclusions, the skill says so. A confident wrong insight is worse than an honest uncertainty.

When this triggers

·You've pasted a CSV and want a real read, not a description of your own data
·You've inherited a messy spreadsheet and need to know whether it's clean enough to analyse
·You want to know what's interesting in the data without reading every row
·You need a chart recommendation that fits the data type rather than a generic one
·You suspect the data has a problem (duplicates, outliers, inconsistent dates) and want it flagged before any analysis

Example

Trigger

User pastes 18 months of monthly sales data: 540 rows, columns for date, product, channel, region, revenue, units, refunds.

Output

DATA HEALTH: AMBER Rows: 540 | Columns: 7 | Complete rows: 96% Issues found: · 'Refunds' column: 4% missing — likely means £0, not unknown. Auto-filled as 0 with flag. · Dates: mixed DD/MM/YYYY and YYYY-MM-DD. Standardised to ISO. · 3 duplicate (date, product, channel) rows in March — kept highest revenue value, flagged for user review. · 'Region': 41 unique values for what should be ~10 regions. Likely spelling/casing variance ("london" vs "London"). Classification: Panel data (entities across time) Key Profile Revenue: median £8,400/month, mean £11,200 — right-skewed by a small number of large months. Report median. Date range: Jan 2024-Jun 2025, no gaps. Products: 12 SKUs, top 3 are 71% of revenue (concentration). Insights (SCAN) Segments: SKU-3 is 38% of revenue alone. Lose it and the business halves. Changes: Direct channel revenue accelerating (+34% YoY). Wholesale flat. Marketplace declining (-12%). Anomalies: April 2025 spike (+162% vs avg) — single wholesale order, not a trend. Strip from baseline before any forecast. Numbers that matter: Repeat customer rate, AOV trend, refund rate by channel. Calculated below. Recommended visualisations · Channel revenue stacked over 18 months (line, not bar) · Top-10 SKU horizontal bar by revenue, sorted · Refund rate by channel — bar with industry benchmark line Recommendations 1. Diversify off SKU-3 (38% concentration) — high priority 2. Double down on Direct (the only accelerating channel) 3. Investigate marketplace decline before next quarter's plan

Get this skill + 8 more

Get the full Data & Analysis pillar (9 skills) or the complete library.

Get the full stack — $299

What you get

152-line SKILL.md, ready to drop into ~/.claude/skills/
Tested through 3 Karpathy-loop iterations (versions v1.0.0 → v1.3.0)
Triggers automatically when relevant — no command to remember
Lifetime updates as the skill is refined further