- Shell 85.5%
- JavaScript 13%
- CSS 1.5%
|
|
||
|---|---|---|
| .claude | ||
| .claude-plugin | ||
| .github | ||
| docs | ||
| docs-site | ||
| scripts | ||
| skills | ||
| .gitignore | ||
| CLAUDE.md | ||
| README.md | ||
harness-skills-plugin
A Claude Code plugin implementing the multi-agent build harness architecture from Anthropic Engineering: Harness Design for Long-Running Apps.
Extends the original pattern with a codebase auditor for existing projects, file-based handoffs that survive context resets, and a polish phase for iterative refinement.
Architecture
flowchart TD
U([User Prompt]) --> O
subgraph Harness["Claude Code — Harness Skills"]
O["🎯 Orchestrator\nharness-orchestrator\nSequencing · Gates · Handoffs"]
O -->|new project| P
O -->|existing project| A
A["🔍 Auditor\nharness-auditor\nCodebase Analysis · Tech Debt · Constraints"]
A -->|CODEBASE_AUDIT.md| P
P["📋 Planner\nharness-planner\nProduct Spec · Calibration Anchors · Sprint Plan"]
P -->|PRODUCT_SPEC.md| B
B["🔨 Builder\nharness-builder\nSprint Execution · Git Commits · Sprint Contracts"]
B -->|source code| E
E["✅ Evaluator\nharness-evaluator\nQA · Scoring · Pivot Suggestions · Polish Mode"]
E -->|pass| O
E -->|retry| B
E -->|polish loop| B
end
subgraph State[".harness/ — File-Based State"]
direction LR
H[HANDOFF.md]
S[PRODUCT_SPEC.md]
Q[QA_REPORT.md]
C[config.yaml]
end
O <-->|read/write| State
P <-->|read/write| State
B <-->|read/write| State
E <-->|read/write| State
O -->|complete| Done([Shipped Product])
Key insight from the Anthropic post: separated evaluation beats self-evaluation. The evaluator reads builder output cold, from files, with a skeptical prompt — it never shares conversation context with the builder.
Skills
| Skill | Role | Key Outputs |
|---|---|---|
harness-orchestrator |
Entry point, sequencing, manual gates | config.yaml, HANDOFF.md |
harness-auditor |
Codebase analysis for existing projects | CODEBASE_AUDIT.md |
harness-planner |
Prompt → product spec + sprint plan | PRODUCT_SPEC.md, CALIBRATION.md |
harness-builder |
Spec → working code, sprint by sprint | Source code, git commits |
harness-evaluator |
QA, grading, feedback, polish loop | QA_REPORT.md |
harness-brief |
Standalone project brief generator | PROJECT_BRIEF.md |
How File-Based Handoffs Work
Each agent reads its inputs from .harness/ files and writes its outputs there — no conversation context is shared between phases. This means:
- Context resets are safe — resume any time with
Resume the harness - Subagent-ready — each phase can run as a separate Claude subagent
- Human-steerable — edit
.harness/files directly to change scope, thresholds, or sprint plans mid-run
Installation
/plugin marketplace add woink/harness-skills-plugin
/plugin install harness-skills@harness-skills-plugin
Usage
New project:
Spin up the harness: Build a real-time dashboard for NYC park conditions.
Existing project:
Run the harness on this project. I want to add real-time notifications.
Resume after context reset:
Resume the harness.
Unattended multi-session run:
bash scripts/run-harness.sh --max-iterations 20
Configuration
.harness/config.yaml controls thresholds and gates:
harness_version: 1
project_mode: greenfield # greenfield | enhance | refactor | rescue
eval_mode: hybrid # playwright | automated-tests | manual-gates | hybrid
max_sprint_retries: 2
qa_pass_threshold:
functionality: 7 # hard floor — does it work?
design_quality: 6
code_quality: 6
product_depth: 6
manual_gates:
after_audit: true
after_spec: true
after_each_sprint: false
after_final_qa: true
Cost Estimates
| Phase | Typical Cost | Duration |
|---|---|---|
| Auditor | $1–5 | 5–15 min |
| Planner | $0.50–2 | 3–10 min |
| Builder (per sprint) | $20–70 | 30–120 min |
| Evaluator (per sprint) | $3–5 | 5–10 min |
A 5-sprint build with 1 retry averages $100–200 total.