Ward's Claude harness
  • Shell 85.5%
  • JavaScript 13%
  • CSS 1.5%
Find a file
Ward Price 79e67f46dc
fix(harness): remove --no-auto-compact flag (not supported by claude CLI)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 22:21:41 -04:00
.claude feat(harness): register context-monitor PreToolUse hook in project settings 2026-04-16 20:56:21 -04:00
.claude-plugin docs: rewrite README with mermaid architecture diagram and fix marketplace source field 2026-04-13 00:44:07 -04:00
.github chore: add CODEOWNERS to require owner review on all PRs 2026-04-16 21:51:05 -04:00
docs docs: add context-aware session cycling implementation plan 2026-04-16 20:43:56 -04:00
docs-site fix(docs-site): apply baseUrl to root redirect for GitHub Pages 2026-04-14 22:57:12 -04:00
scripts fix(harness): remove --no-auto-compact flag (not supported by claude CLI) 2026-04-16 22:21:41 -04:00
skills chore(harness): mirror orchestrator skill to nested plugin path 2026-04-16 21:00:53 -04:00
.gitignore chore: ignore .harness/context-tool-calls runtime file 2026-04-16 21:01:36 -04:00
CLAUDE.md chore: collapse permission list to minimal wildcards 2026-04-15 00:03:11 -04:00
README.md docs: rewrite README with mermaid architecture diagram and fix marketplace source field 2026-04-13 00:44:07 -04:00

harness-skills-plugin

A Claude Code plugin implementing the multi-agent build harness architecture from Anthropic Engineering: Harness Design for Long-Running Apps.

Extends the original pattern with a codebase auditor for existing projects, file-based handoffs that survive context resets, and a polish phase for iterative refinement.

Architecture

flowchart TD
    U([User Prompt]) --> O

    subgraph Harness["Claude Code — Harness Skills"]
        O["🎯 Orchestrator\nharness-orchestrator\nSequencing · Gates · Handoffs"]

        O -->|new project| P
        O -->|existing project| A
        A["🔍 Auditor\nharness-auditor\nCodebase Analysis · Tech Debt · Constraints"]
        A -->|CODEBASE_AUDIT.md| P

        P["📋 Planner\nharness-planner\nProduct Spec · Calibration Anchors · Sprint Plan"]
        P -->|PRODUCT_SPEC.md| B

        B["🔨 Builder\nharness-builder\nSprint Execution · Git Commits · Sprint Contracts"]
        B -->|source code| E

        E["✅ Evaluator\nharness-evaluator\nQA · Scoring · Pivot Suggestions · Polish Mode"]

        E -->|pass| O
        E -->|retry| B
        E -->|polish loop| B
    end

    subgraph State[".harness/ — File-Based State"]
        direction LR
        H[HANDOFF.md]
        S[PRODUCT_SPEC.md]
        Q[QA_REPORT.md]
        C[config.yaml]
    end

    O <-->|read/write| State
    P <-->|read/write| State
    B <-->|read/write| State
    E <-->|read/write| State

    O -->|complete| Done([Shipped Product])

Key insight from the Anthropic post: separated evaluation beats self-evaluation. The evaluator reads builder output cold, from files, with a skeptical prompt — it never shares conversation context with the builder.

Skills

Skill Role Key Outputs
harness-orchestrator Entry point, sequencing, manual gates config.yaml, HANDOFF.md
harness-auditor Codebase analysis for existing projects CODEBASE_AUDIT.md
harness-planner Prompt → product spec + sprint plan PRODUCT_SPEC.md, CALIBRATION.md
harness-builder Spec → working code, sprint by sprint Source code, git commits
harness-evaluator QA, grading, feedback, polish loop QA_REPORT.md
harness-brief Standalone project brief generator PROJECT_BRIEF.md

How File-Based Handoffs Work

Each agent reads its inputs from .harness/ files and writes its outputs there — no conversation context is shared between phases. This means:

  • Context resets are safe — resume any time with Resume the harness
  • Subagent-ready — each phase can run as a separate Claude subagent
  • Human-steerable — edit .harness/ files directly to change scope, thresholds, or sprint plans mid-run

Installation

/plugin marketplace add woink/harness-skills-plugin
/plugin install harness-skills@harness-skills-plugin

Usage

New project:

Spin up the harness: Build a real-time dashboard for NYC park conditions.

Existing project:

Run the harness on this project. I want to add real-time notifications.

Resume after context reset:

Resume the harness.

Unattended multi-session run:

bash scripts/run-harness.sh --max-iterations 20

Configuration

.harness/config.yaml controls thresholds and gates:

harness_version: 1
project_mode: greenfield        # greenfield | enhance | refactor | rescue
eval_mode: hybrid               # playwright | automated-tests | manual-gates | hybrid
max_sprint_retries: 2
qa_pass_threshold:
  functionality: 7              # hard floor — does it work?
  design_quality: 6
  code_quality: 6
  product_depth: 6
manual_gates:
  after_audit: true
  after_spec: true
  after_each_sprint: false
  after_final_qa: true

Cost Estimates

Phase Typical Cost Duration
Auditor $15 515 min
Planner $0.502 310 min
Builder (per sprint) $2070 30120 min
Evaluator (per sprint) $35 510 min

A 5-sprint build with 1 retry averages $100200 total.

Based On

Anthropic Engineering: Harness Design for Long-Running Apps