Caelan's Domain

Part 6 — Measurement: Knowing What Works

aiclaudecoworkmeasurementkpisanalyticscowork-skillscowork-memory

Created: April 17, 2026 | Modified: April 21, 2026

Cowork Features
Used: Skills, Memory, Scheduled Tasks
Quick Start
This Part builds on the functioning pipeline you wired up in Part 5, but can be followed with any operational data the role already produces.

Starter data
A small CSV of whatever the role already tracks — pipeline counts, response times, cycle times, conversion rates, SLA adherence — covering the last three reporting periods. Two or three periods is enough to baseline.

Measure Before You Automate

The Project has been shipping work. The pipeline from Part 5 drafts artifacts, checks them against rules, sequences next steps, and hands off to the next stage. Output is landing. Queues are moving. If this were a human hire, this is the point at which the question stops being "is the role producing?" and becomes "is any of it reading through to the numbers the business actually cares about?"

That is the Part. You define what the Project is accountable to, in numbers, and then you hold it to them.

The temptation is to skip to automation — put the pipeline on a timer, walk away, and check back at the end of the quarter. That is Part 7. Do not skip to it. Automating without measurement is doubling throughput on a process nobody is reading the output of. More artifacts, faster, across more surfaces, with no read on whether any of them are producing the outcome the role exists to deliver. Speed without direction is expensive noise.

This Part gives you direction. You define the KPIs that match the role, build a Skill that reads performance data against a stored baseline, store the baseline in Memory, and set a cadence that surfaces real movement without drowning you in dashboards. Once the Project knows what success looks like, automating becomes a force multiplier instead of a gamble.

Pointer — CLAUDE.md declares what to measure
The KPIs section of CLAUDE.md is where the numbers for any role live. A Sales CLAUDE.md declares pipeline coverage, win rate, ACV, cycle time, and quota attainment. A Support CLAUDE.md would declare first-response time, resolution time, CSAT, and escalation rate. An Operations CLAUDE.md would declare process cycle time, SLA adherence, vendor renewal rate, and audit-pass rate. A Content, Brand, or Demand CLAUDE.md would declare briefs-to-published rate, voice-consistency pass rate, pipeline-attribution percent, and engaged-audience growth. Same section, different metrics — CLAUDE.md is the authoritative record for KPIs, and every other surface (Skills, Memory, Scheduled Tasks) reads from it.

Define the KPIs

KPIs — key performance indicators — are the three to five numbers that connect the role's activity to a business outcome the CEO can read without a translator. The phrase sounds corporate. The discipline is simple: pick the handful of numbers that tell the truth about this role, and watch them over time. Pick the three to five that match the goals stated in the CLAUDE.md back in Part 1, and write them into a dedicated KPIs section of the same CLAUDE.md.

The examples below rotate across roles so the pattern is clear. Every one of them has the same shape: a number, a definition, a target range (not a point value), and a read — what moving up or down actually means.

Output quality pass rate (any role with an output gate). Percent of drafts that clear the role's quality check on first pass — a content brief passing the voice rule, a lead qualifier returning Hot or Warm, a support draft passing the escalation rubric, an ops SOP clearing the audit template. Reads as whether the Skills and Rules together give the authoring surface enough context to produce shippable output by default. Target range: 70-90% first-pass pass; below 60% means the rule or the skeleton needs a pass.

Pipeline velocity / cycle time (any role that moves items through stages). Median time from entry to exit of the role's primary workflow. Sales: qualified-lead to closed-won. Content: brief to published asset. Support: ticket open to resolution. Ops: request submitted to request fulfilled. Reads as friction between stages. A cycle time that creeps up without a stated change means a stage has quietly become the bottleneck.

Conversion / throughput at the role's terminal stage (varies by role).

  • Sales: qualified MQL-to-SQL conversion; win rate on closed deals.
  • Content / Brand: briefs-to-published rate; pipeline attributed to content.
  • Support: first-contact resolution rate; escalation rate.
  • Ops: SLA adherence; vendor renewal rate; audit-pass rate.

Same KPI slot, role-specific metric. The KPIs section of CLAUDE.md names which one applies.

Coverage / pipeline health (any role with a forward-looking book). Sales: open pipeline value vs. remaining quota (healthy band roughly 3-4x). Content: count of briefs in flight vs. monthly publish target. Support: open-ticket backlog vs. team capacity. Ops: open requests vs. weekly throughput. Reads as whether the role is going to meet its commitment without a fire drill.

Forecast / promise accuracy (any role that commits a number forward). Sales: forecast accuracy vs. actuals by period. Ops: SLA-met rate vs. SLA-committed rate. Support: stated-ETA-held rate. Content: campaign-performance vs. Strategist's stated success metric. Reads as whether the role's commitments are trustworthy, which is the number every stakeholder upstream actually cares about.

Pick three to five for this role. Write them into the KPIs section of CLAUDE.md, each with a definition, a target range, and a short read. Match them to the goals stated elsewhere in the same CLAUDE.md. Target ranges, not point values — a KPI with a point value and no band is a trap that either always passes or always fails.


Build a Metrics Analysis Skill

You built Skills in Part 3 — the role's drafting Skill and its quality-check Skill. Same pattern here, but this Skill reads performance data instead of producing output: it takes the numbers, compares them to a stored baseline, and tells you what moved.

The route: /skill-creator. From Part 3 onward, /skill-creator is the standard path for new Skills. Open a new conversation in the Cowork Project, type /skill-creator, paste the prompt below, and answer the interview questions. On approval, the builder saves .claude/skills/metrics-analyzer/SKILL.md. The manual paste-block is preserved in the closer-look callout below for readers who want to compare /skill-creator's generated skeleton against the hand-built version.

Paste this into /skill-creator:

Build a skill called "Metrics Analyzer" that:
- Accepts performance data (CSV, pasted table, or text summary) for the
  KPIs declared in the KPIs section of CLAUDE.md
- Compares current metrics to baseline values stored in Memory
  ("kpi-baseline")
- Identifies trends (improving, declining, flat) for each KPI
- Highlights the top 3 wins and top 3 concerns
- Recommends specific actions based on the data, each tied to a named
  KPI and a named surface (channel, stage, queue, process)
- Outputs a structured report with one subsection per KPI
- Flags any metric that has moved 20% or more from baseline in either
  direction, relative to a stated observation window
- Notes any metric that has held a 3-period trend in the same direction

/skill-creator will ask you follow-up questions. Answers the builder usually needs:

What inputs does the skill need? Performance data in any format — CSV, pasted table, or plain-text summary — covering the KPIs declared in the KPIs section of CLAUDE.md for this role. The skill handles whatever the role produces.

Should it reference anything from Memory? Yes. Baseline metrics stored under the Memory entry kpi-baseline. The skill compares current data to the baseline to identify trend and magnitude.

What format should the output take? A structured report with these sections: Executive Summary (3-4 sentences, bounded to the observation window), Per-KPI Performance (one subsection per KPI declared in the KPIs section of CLAUDE.md), Top 3 Wins, Top 3 Concerns, Recommended Actions, and a Data Quality note flagging any missing or suspicious numbers.

How should it handle missing data? Note the gap, analyze what it has, and state what the gap prevents. Do not refuse to produce a report because one KPI is missing; do not invent numbers to fill a gap.

Review the generated skill, adjust anything that does not match the KPIs section of CLAUDE.md, and save.

A closer look — Memory retrieval inside a Skill
This Skill reads the kpi-baseline entry from Cowork Memory on every run. The primary Memory mechanics callout lives in Part 1; the short version is that Memory is Cowork-managed, surfaced in the Project sidebar, and not a folder you can open in a text editor. When the Skill asks Memory for kpi-baseline, Cowork looks up the entry for this Project and loads it into the turn alongside CLAUDE.md and Rules. If the entry is missing, the Skill notes the gap and analyzes the new data on its own merits rather than inventing a baseline.

Here is roughly what /skill-creator generates. If you prefer to build the skill manually, paste this into a new skill called metrics-analyzer:

Analyze the performance data provided and produce a structured report
against the KPIs declared in the KPIs section of CLAUDE.md.

INPUTS
- Performance data: CSV, table, or text summary (provided by user)
- Baseline metrics: retrieved from Memory entry "kpi-baseline"
- KPI definitions and target ranges: declared in the KPIs section of
  CLAUDE.md for this role

ANALYSIS PROCESS
1. Parse the provided data and identify which declared KPIs it covers.
2. Retrieve baseline values from Memory.
3. Calculate period-over-period changes for each KPI, bounded to the
   observation window stated in the data.
4. Compare current values to baseline and to the target range
   documented in CLAUDE.md.
5. Identify trends: improving (3+ consecutive periods of growth toward
   target), declining (3+ periods away from target), or flat.
6. Flag any KPI that has moved 20%+ from baseline in either direction
   relative to the stated observation window.

OUTPUT FORMAT

## Performance Report — [Period Range]

### Executive Summary
3-4 sentences covering overall trajectory, biggest win, biggest concern,
and one recommended priority action. Bound every claim to the stated
observation window; do not generalize outside it.

### Per-KPI Performance
One subsection per KPI declared in the KPIs section of CLAUDE.md. For each:
- Current value vs. baseline vs. target range
- Trend direction and magnitude over the stated window
- One-sentence read of what the movement means

### Top 3 Wins
The three KPIs showing the strongest movement toward target. For each:
what improved, by how much, over which window, and what likely caused it.

### Top 3 Concerns
The three KPIs showing the most worrying trajectory. For each: what
declined or stalled, by how much, over which window, and what might
be causing it.

### Recommended Actions
3-5 specific, prioritized actions. Each action names the KPI it targets,
the surface it operates on (channel, stage, queue, process, rule file,
skill, agent), and the expected direction of impact. "Do more of X" is
not specific enough. "Increase the cadence of the Friday hygiene pass
from weekly to twice-weekly to test whether no-touch-days drops below
5" is.

### Data Quality Notes
Flag any missing KPIs, suspicious numbers (e.g., a 500% spike that
might be a tracking error), or gaps in the data that limit the analysis.

RULES
- Compare to baseline from Memory. If no baseline exists, note this and
  analyze the data on its own merits.
- Be direct about what is working and what is not. Do not soften bad news.
- Every recommended action must tie to a declared KPI and a named surface.
- If a KPI has moved 20%+ from baseline, flag it prominently with the
  observation window that produced the read.

Baseline Your Current State

A trend requires at least two data points. Before the analyzer can tell you whether things are moving, it needs to know where the role started. That starting point is the baseline, and you store it in Memory — the same persistent context store introduced in Part 1 — so every future report reads from the same reference.

Gather current numbers. If the role already has instrumentation (CRM dashboards for sales, web and email analytics for content, ticketing for support, ERP or SLA reports for ops), export the last three reporting periods. If you are following along with the tutorial and want to test the Skill first, paste any small CSV the role already produces — pipeline counts, response times, cycle times, conversion rates — covering three periods. Two or three periods is enough to anchor a baseline without overfitting.

Paste the data into the Cowork Project with this prompt:

Store this as our performance baseline in Memory entry "kpi-baseline".
These are our most recent three reporting periods. Use the final period
as the primary baseline for future comparisons, and note the
period-over-period trend for each KPI declared in the KPIs section of
CLAUDE.md.

[paste your CSV here]

After storing, confirm what you saved and highlight any KPI that already
shows a clear trend direction over the stated window.

The Project parses the data, stores baseline values in Memory, and summarizes what it sees. The most recent period becomes the reference point. The three-period window gives trend context without pretending to a year of history the role does not yet have.

Bring the role's real data
Sample data is fine for the tutorial; real insight requires real data. Export from the system the role actually uses — CRM for sales, web and email analytics for content, ticketing for support, the ERP or SLA dashboard for ops. The analyzer reads whatever the role already produces.

From here on, every time the analyzer runs, the Project pulls baseline values from Memory and compares new data against them. You do not re-paste the baseline. It persists. That is the feedback loop: the pipeline produces, the analyzer reads the output, results flow back into Memory, and the next run opens with the prior run as context. Memory is not just storage. It is the surface on which the Project's understanding of "what worked" accumulates.

Scheduled Tasks as KPI snapshots
You can register a Scheduled Task that saves a dated KPI snapshot as a file (for example, pipeline/kpi-snapshots/{YYYY-MM-DD}.md) on the cadence the role reports on — weekly, monthly, or both. The snapshot is the durable record the next run reads; Memory holds the rolling baseline, and the saved snapshot file holds the history. Scheduled Tasks are covered end-to-end in Part 7; the pointer here is that the analyzer's outputs are full-fledged pipeline artifacts like any other stage.

The Reporting Framework

Data without rhythm becomes noise. You check once, take a snapshot, forget about it until something breaks. A reporting framework turns measurement from an event into a habit.

Three cadences cover what any role needs.

Weekly: The Pulse Check

Once a week, five minutes on the basics. Looking for anomalies, not trends. Did the primary workflow metric drop off a cliff on Wednesday? Did a queue back up? Did one output surface produce ten times the normal volume?

Prompt the Project:

Here is this week's data: [paste quick numbers or a screenshot summary]

Give me a 3-sentence pulse check bounded to this week. Anything unusual?
Anything to act on this week?

The weekly check is not an analysis. It is a smoke detector. If nothing unusual happened, the answer is "steady week, no action needed" and you move on.

Monthly: The Full Review

Once a month, run the Metrics Analyzer skill against the latest data. Compare to baseline, look at trends over the month's window, decide whether to adjust.

Run the metrics-analyzer skill with this data:

[paste the month's CSV or data summary]

Compare against our stored baseline. What improved? What declined? What
do you recommend we change, and to which surface (channel, stage, queue,
rule, skill, agent)?

The monthly report is the primary decision-making tool. Role-specific reads:

  • Sales: a win rate down 8 points over the month while ACV holds steady usually means leads are entering the pipeline too early — tighten the Lead Qualifier's handoff threshold.
  • Content / Brand: briefs-to-published falling below 70% while brief volume holds steady means the quality check or the distribution queue is the bottleneck, not authoring.
  • Support: first-contact resolution dropping 10 points while volume is flat usually means a new product issue is surfacing under several ticket titles — cluster the escalations and write a rule.
  • Ops: SLA adherence slipping in one process while others hold means the process cycle time has exceeded its budget — re-scope or add a stage.

Same cadence, same Skill, role-specific action.

Quarterly: The Strategy Review

Every three months, step back from KPI-level data and ask bigger questions. Are the goals in CLAUDE.md being met over the quarter? Has the operating context shifted? Should the Project add a surface, drop one, or change the mix? This is where the Project earns its keep as a strategist, not just an analyst.

Review our last three monthly reports and our original goals from
CLAUDE.md. Over the last quarter: are we on track? What should change
for next quarter?

Produce a quarterly strategy brief with:
- Progress against each goal (on track / behind / ahead) over the quarter
- One thing to stop doing
- One thing to start doing
- One thing to keep doing

The quarterly review connects KPI data back to the goals stated in CLAUDE.md, and recommends changes to the plan — not just the tactics. It also writes new context back into Memory — what worked over the quarter, what did not, what next quarter should prioritize. That is how the Project's understanding of the role sharpens: each review becomes the baseline for the next one.

What triggers action between cadences? Three scenarios should prompt an immediate review, regardless of schedule:

  • Any KPI moves 20% or more from baseline in a single reporting period. Something broke, or something worked; either way, find out which.
  • A surface consistently underperforms for three consecutive periods. Flat for one period is fine. Flat for a quarter means the surface is not working.
  • A surface suddenly outperforms expectations. A KPI 30% above baseline is not just good news — it is a signal to investigate the cause and do more of it deliberately.

Off-Ramp 3: What You Have Built

What you have built: A scoped, approval-gated pipeline with measurement attached. The Project knows what it is producing, what is working, what is not, and where to focus. For a role most companies run on vibes and quarterly guesswork, this is a step change.

What is ahead: Part 7 puts recurring work on the Scheduled Tasks calendar, demonstrates the pattern transferring to a second role by forking the Project and swapping CLAUDE.md, and closes the series. The pipeline is already producing measurable results.

This is a real stopping point, and a good one. The Project knows the business, the audience, and the voice. It follows rules and gates every write. The pipeline from Part 5 drafts, checks, sequences, and hands off. Measurement now closes the loop — the pipeline produces numbers, the analyzer reads them, and the next cycle opens with the last one as context.

Most 20-to-500-person companies do not have this for the role in question. They review the function once a quarter when someone asks "how is it going?" and go on vibes for the other eleven weeks. You have a system, data, and a feedback loop that tells you what the data means and what to do about it.

If you stop here, the role is running on evidence. The pipeline produces. The measurement tells you whether the production matters.


What just changed

Before this exercise, the Cowork Project shipped work but had no repeatable instrument for reading whether the work was landing. You walked /skill-creator through the Metrics Analyzer specification and approved the save. Cowork wrote .claude/skills/metrics-analyzer/SKILL.md and stored the opening numbers in the Memory entry kpi-baseline. Future analyzer runs load the Skill, compare new numbers against that baseline over a stated observation window, and flag real movement. The loop is closed: pipeline output becomes Memory input, and the next review opens where this one left off.


What is Next

The pipeline works. It can be measured. The remaining step is to stop starting every run by hand.

Right now the pipeline runs when you open Cowork. You paste data when you remember. You trigger the analyzer when you have time. Every stage works, but every stage depends on you initiating it.

Part 7 changes that. You will put recurring work on Cowork's Scheduled Tasks calendar — the KPI snapshot, the weekly pulse, the monthly review — and the Project wakes up on schedule and leaves dated files for you to review. The same Part demonstrates the pattern transferring to a second role by forking the Project and swapping CLAUDE.md, and closes the series.