mirror of https://github.com/woink/harness-skills-plugin.git synced 2026-04-30 08:00:42 -07:00

Ward's Claude harness

Shell 85.5%
JavaScript 13%
CSS 1.5%

Find a file

Ward Price 79e67f46dc fix(harness): remove --no-auto-compact flag (not supported by claude CLI) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>		2026-04-16 22:21:41 -04:00
.claude	feat(harness): register context-monitor PreToolUse hook in project settings	2026-04-16 20:56:21 -04:00
.claude-plugin	docs: rewrite README with mermaid architecture diagram and fix marketplace source field	2026-04-13 00:44:07 -04:00
.github	chore: add CODEOWNERS to require owner review on all PRs	2026-04-16 21:51:05 -04:00
docs	docs: add context-aware session cycling implementation plan	2026-04-16 20:43:56 -04:00
docs-site	fix(docs-site): apply baseUrl to root redirect for GitHub Pages	2026-04-14 22:57:12 -04:00
scripts	fix(harness): remove --no-auto-compact flag (not supported by claude CLI)	2026-04-16 22:21:41 -04:00
skills	chore(harness): mirror orchestrator skill to nested plugin path	2026-04-16 21:00:53 -04:00
.gitignore	chore: ignore .harness/context-tool-calls runtime file	2026-04-16 21:01:36 -04:00
CLAUDE.md	chore: collapse permission list to minimal wildcards	2026-04-15 00:03:11 -04:00
README.md	docs: rewrite README with mermaid architecture diagram and fix marketplace source field	2026-04-13 00:44:07 -04:00

README.md

harness-skills-plugin

A Claude Code plugin implementing the multi-agent build harness architecture from Anthropic Engineering: Harness Design for Long-Running Apps.

Extends the original pattern with a codebase auditor for existing projects, file-based handoffs that survive context resets, and a polish phase for iterative refinement.

Architecture

flowchart TD
    U([User Prompt]) --> O

    subgraph Harness["Claude Code — Harness Skills"]
        O["🎯 Orchestrator\nharness-orchestrator\nSequencing · Gates · Handoffs"]

        O -->|new project| P
        O -->|existing project| A
        A["🔍 Auditor\nharness-auditor\nCodebase Analysis · Tech Debt · Constraints"]
        A -->|CODEBASE_AUDIT.md| P

        P["📋 Planner\nharness-planner\nProduct Spec · Calibration Anchors · Sprint Plan"]
        P -->|PRODUCT_SPEC.md| B

        B["🔨 Builder\nharness-builder\nSprint Execution · Git Commits · Sprint Contracts"]
        B -->|source code| E

        E["✅ Evaluator\nharness-evaluator\nQA · Scoring · Pivot Suggestions · Polish Mode"]

        E -->|pass| O
        E -->|retry| B
        E -->|polish loop| B
    end

    subgraph State[".harness/ — File-Based State"]
        direction LR
        H[HANDOFF.md]
        S[PRODUCT_SPEC.md]
        Q[QA_REPORT.md]
        C[config.yaml]
    end

    O <-->|read/write| State
    P <-->|read/write| State
    B <-->|read/write| State
    E <-->|read/write| State

    O -->|complete| Done([Shipped Product])

Key insight from the Anthropic post: separated evaluation beats self-evaluation. The evaluator reads builder output cold, from files, with a skeptical prompt — it never shares conversation context with the builder.

Skills

Skill	Role	Key Outputs
`harness-orchestrator`	Entry point, sequencing, manual gates	`config.yaml`, `HANDOFF.md`
`harness-auditor`	Codebase analysis for existing projects	`CODEBASE_AUDIT.md`
`harness-planner`	Prompt → product spec + sprint plan	`PRODUCT_SPEC.md`, `CALIBRATION.md`
`harness-builder`	Spec → working code, sprint by sprint	Source code, git commits
`harness-evaluator`	QA, grading, feedback, polish loop	`QA_REPORT.md`
`harness-brief`	Standalone project brief generator	`PROJECT_BRIEF.md`

How File-Based Handoffs Work

Each agent reads its inputs from .harness/ files and writes its outputs there — no conversation context is shared between phases. This means:

Context resets are safe — resume any time with Resume the harness
Subagent-ready — each phase can run as a separate Claude subagent
Human-steerable — edit .harness/ files directly to change scope, thresholds, or sprint plans mid-run

Installation

/plugin marketplace add woink/harness-skills-plugin
/plugin install harness-skills@harness-skills-plugin

Usage

New project:

Spin up the harness: Build a real-time dashboard for NYC park conditions.

Existing project:

Run the harness on this project. I want to add real-time notifications.

Resume after context reset:

Resume the harness.

Unattended multi-session run:

bash scripts/run-harness.sh --max-iterations 20

Configuration

.harness/config.yaml controls thresholds and gates:

harness_version: 1
project_mode: greenfield        # greenfield | enhance | refactor | rescue
eval_mode: hybrid               # playwright | automated-tests | manual-gates | hybrid
max_sprint_retries: 2
qa_pass_threshold:
  functionality: 7              # hard floor — does it work?
  design_quality: 6
  code_quality: 6
  product_depth: 6
manual_gates:
  after_audit: true
  after_spec: true
  after_each_sprint: false
  after_final_qa: true

Cost Estimates

Phase	Typical Cost	Duration
Auditor	$1–5	5–15 min
Planner	$0.50–2	3–10 min
Builder (per sprint)	$20–70	30–120 min
Evaluator (per sprint)	$3–5	5–10 min

A 5-sprint build with 1 retry averages $100–200 total.

Based On

Anthropic Engineering: Harness Design for Long-Running Apps

README.md Unescape Escape