Capabilities as code for AI-native review

Review what changed in behavior, not only what changed in code.

CapabilityKit helps reviewers understand AI-generated changes by connecting code diffs back to intended behavior, verification evidence, and downstream impact.

Plans often diverge from implementation. Code diffs rarely explain intent.

AI agents make implementation cheaper, but they also make it easier for important product decisions to disappear into generated code. CapabilityKit keeps the durable part of the plan: what the system is supposed to do, where it is implemented, how deeply it is verified, and what depends on it.

Video walkthroughs

Watch the CapabilityKit playlist.

Follow the video series to see capability files, status summaries, diffs, implementation assessment, impact analysis, and story-map views in context.

Core concepts

Capabilities are requirements that stay close to the code.

CapabilityKit is not a second issue tracker. It turns durable product behavior into version-controlled files that reviewers, tests, and coding agents can inspect together.

01

Capability as contract

A capability records what the system should do, why it matters, and which acceptance criteria must stay true as the code changes.

02

Evidence beside intent

Implementation references point to source, tests, docs, routes, handlers, and workflows so claims can be reviewed against code.

03

Verification depth

Automated checks, manual review, known gaps, and saved agent review evidence make confidence visible instead of implied.

04

Code impact

Changed files can be mapped back to referenced capabilities and downstream dependents, narrowing what needs retesting.

05

PR behavior review

Capability diffs show what should be added, removed, or changed, while impact and verification show what might be at risk.

06

Story-map slices

Release, backbone, and step metadata organize progressive slices without separating planning from delivery evidence.

Demo narrative

What the YouTube walkthrough shows

Start with the problem of requirements drifting away from code, open one capability file, then run status, diff, assess, impact, and story-map commands to show how capability review works.

node packages/cli/dist/index.js status
node packages/cli/dist/index.js diff --base HEAD
node packages/cli/dist/index.js assess core/assessment/assess-implementation-coverage
node packages/cli/dist/index.js impact core/model/define-capability-format
node packages/cli/dist/index.js status --story-map --release pr-review

Developer process

A review loop for capability changes.

CapabilityKit is designed for the moment when a developer or reviewer needs to understand an AI-assisted change without reconstructing requirements from prompts, stale plans, and implementation details.

1

Read the capability diff

See added, changed, and removed capability intent, acceptance, verification, references, and review policy.

2

Assess implementation coverage

Compare every acceptance criterion with evidence from the referenced source, test, and documentation files.

3

Inspect dependency impact

Traverse direct and transitive dependents to find related capabilities that may need checks or review.

4

Grow verification

Add tests, manual review evidence, agent review results, or explicit accepted gaps before confidence decays.

Why this matters

Planning documents are not enough after an agent writes the code.

  • Plans often record code decisions, not the lasting capability contract.
  • Generated implementation can drift from the plan before review begins.
  • Reviewers need to know whether new and existing capabilities are actually verified.
  • A simple capability edit can change downstream agent prompts, CLI behavior, compiled artifacts, or docs.

Capability diff

What behavior changed?

`capabilitykit diff` summarizes intent, acceptance, verification, implementation reference, and ignore policy changes against a Git base. It excludes noisy saved review evidence by default.

capabilitykit diff HEAD
 intent changed
 acceptance +2/-0
 verification +1/-0
 Impact: 3 direct, 7 transitive

Verification depth

How strong is the evidence?

`capabilitykit assess` reads declared implementation references and places each acceptance criterion beside concrete evidence. Uncertain findings stay visible until semantic review, tests, or accepted gaps resolve them.

covered: status summary exists
uncertain: impact evidence found
uncovered: no semantic review saved

Impact graph

What else may break?

`capabilitykit impact` follows explicit `agent.depends_on` relationships and collects suggested automated checks, manual review steps, and known verification gaps across the impacted set.

Direct dependents: 5
Transitive dependents: 9
Suggested checks: npm test, compile

Supported coding agents

Use the coding agent already in your workflow.

CapabilityKit can hand a capability assessment bundle to supported coding-agent CLIs. Semantic review inspects implementation evidence and saves structured `agent.review` metadata without editing implementation code.

Local CLI

Codex

Runs `codex exec` with the assessment prompt on stdin.

capabilitykit verify core/example --agent codex
Local CLI

GitHub Copilot

Runs Copilot CLI in response-only prompt mode.

capabilitykit verify core/example --agent copilot
Local CLI

Pi Coding Agent

Runs Pi in ephemeral print mode without saving a session.

capabilitykit verify core/example --agent pi
Local CLI

Claude Code

Runs Claude Code in non-interactive print mode.

capabilitykit verify core/example --agent claude
Local CLI

Cursor CLI

Runs Cursor CLI in headless mode without enabling `--force`.

capabilitykit verify core/example --agent cursor-agent
Extensible

Custom command

Use explicit arguments and stdin, prompt argument, or prompt-file handoff.

capabilitykit review core/example --agent your-agent

Local verification

One command checks the repo before review or release.

`npm run verify` builds the workspaces, runs tests, validates capability files, and refreshes the compiled capability map. It is the same local loop used by release preparation before publishing is triggered.

npm run verify

npm run release:prep -- patch

Repo-native structure

The capability map is hierarchical before it is a graph.

Capability files live in `.capabilities/` using folders that mirror ownership, product areas, or platform layers. The hierarchy gives reviewers a readable map before they inspect dependency edges.

.capabilities/
  capabilitykit.yaml
  core/
    model/
      define-capability-format.capability.yaml
    validation/
      validate-capability-files.capability.yaml
      detect-verification-gaps.capability.yaml
    graph/
      compile-capabilities.capability.yaml
      diff-capabilities.capability.yaml
      analyze-capability-impact.capability.yaml
    assessment/
      assess-implementation-coverage.capability.yaml
  developer-experience/
    cli/
    skills/
  docs/
    project/
    reference/

Capability anatomy

The human-facing fields describe the contract.

Keep the human-authored section short and focused on product behavior. IDs and area can be derived from file paths, and agent-maintained references or review metadata can be appended later.

title: Diff capabilities
status: implemented
summary: Compare current capability files with a Git base and summarize added,
  changed, and removed capabilities.
intent: Help developers understand product and agent-facing intent changes
  without reading raw YAML diffs.
acceptance:
  - Compares current capabilities against a configurable Git base ref.
  - Reports added, changed, and removed capabilities by capability ID.
  - Summarizes meaningful field changes such as status, intent, acceptance,
    dependencies, implementation references, verification, and ignore policy.
  - Includes downstream impact context for changed capabilities.
guidance:
  - Compare normalized parsed capabilities, not raw YAML text.
  - Avoid raw JSON in the default human output.

Story mapping

Capabilities can grow across release slices.

A capability does not have to ship as one large unit. CapabilityKit lets teams attach story-map metadata to each capability so a first release can prove a thin, coherent outcome, then later releases can deepen the map without moving files or losing implementation evidence. Story-map metadata groups capabilities by release, backbone, and step.

Release

Plan by release slice

`planning.story_map.release` keeps MVP, website, story-mapping, and follow-on slices visible as roadmap data instead of folder structure.

Backbone + step

Group by release, backbone, and step

Story-map metadata groups capabilities by release, backbone, and step, so reviewers can see which part of the outcome each capability strengthens.

Status + evidence

Keep delivery attached

Each story-map card still carries capability status, verification gaps, and implementation references, keeping release conversations connected to actual delivery.

CapabilityKit story map grouped by release, backbone, and step
Link to the generated story-map viewer as a concrete example of progressive capability planning. A thin first slice proves the end-to-end path, and later slices add depth while status, verification risk, and implementation traceability stay visible.
planning:
  story_map:
    backbone: Project communication
    step: Explain story mapping
    release: website
    order: 20

Release slicing is metadata.

The capability ID still comes from its file path. Story-map fields describe how the capability contributes to a release narrative, which lets teams split work progressively without rewriting the capability map.

Capability dependency graph

Small changes can have wide capability impact.

Folders help teams navigate ownership, but explicit dependencies tell reviewers what behavior relies on a capability. That graph turns a local change into an impact report with checks and manual review guidance.

Open source

Start reviewing capabilities as code.

Add a `.capabilities/` folder, validate the map, diff capability changes, assess implementation coverage, and use the dependency graph to review impact.

npm i -g @capabilitykit/cli capabilitykit skill capabilitykit init