claude-review: improve review quality for large PRs

Several issues were identified from analyzing logs of a large (52-commit) PR
review:

- Claude was batching multiple commits into a single review agent instead of
  one per worktree. Strengthen the prompt to explicitly prohibit grouping.
- Claude was reading pr-context.json and commit messages before spawning
  agents despite instructions not to, wasting time. Tighten the pre-spawn
  rules to only allow listing worktrees/ and reading review-schema.json.
- Subagents were spawned with model "sonnet" instead of "opus". Add explicit
  instruction to use opus.
- After agents returned, Claude spent 9 minutes re-verifying findings with
  bash/grep/sed commands, duplicating the agents' work. Add instruction to
  trust subagent findings and only read pr-context.json in phase 2.
- Subagents returned markdown-wrapped JSON instead of raw JSON arrays. Add
  instruction requiring raw JSON output only.
- Each subagent was independently reading review-schema.json. Instead have
  the main agent read it once and paste it into each subagent prompt.
- The "drop low-confidence findings" instruction was being used to justify
  dropping findings that Claude itself acknowledged as valid ("solid cleanup
  suggestions", "reasonable consistency improvement"). Remove the instruction.
- Simplify the deduplication instructions
- Stop adding the severity to the body in the post processing job as claude is
  also adding it so they end up duplicated.
This commit is contained in:
Daan De Meyer
2026-04-10 08:04:08 +00:00
committed by Daan De Meyer
parent fb0ae7436d
commit a65ebc3ff9

View File

@@ -256,27 +256,34 @@ jobs:
## Phase 1: Review commits
List the directories in `worktrees/` — there is one per commit. Each
worktree at `worktrees/<sha>/` contains the full source tree checked out at
that commit, plus `commit.patch` (the diff) and `commit-message.txt`
(the commit message). Spawn one
review subagent per worktree, all in a single message so they run concurrently.
Do NOT pre-compute diffs or read any other files before spawning — the subagents
will do that themselves.
First, list the directories in `worktrees/` and read `review-schema.json`.
Then, spawn exactly one review subagent per worktree directory, all in a
single message so they run concurrently. Do NOT batch or group multiple
commits into a single agent. Do NOT read any other files before spawning —
the subagents will do that themselves.
Each worktree at `worktrees/<sha>/` contains the full source tree checked
out at that commit, plus `commit.patch` (the diff) and `commit-message.txt`
(the commit message).
Each reviewer reviews design, code quality, style, potential bugs, and
security implications.
Each subagent must be spawned with `model: "opus"`.
Each subagent prompt must include:
- Instructions to read `pr-context.json` in the repository root for additional
context.
- Instructions to read `review-schema.json` in the repository root and
return a JSON array matching the `comments` items schema from that file.
- The contents of `review-schema.json` (paste it into each prompt so the
agent doesn't have to read it separately).
- The worktree path.
- Instructions to read `commit-message.txt` and `commit.patch` in the
worktree for the commit message and diff.
- Instructions to verify every `line` and `start_line` value
against the hunk ranges in `commit.patch` before returning.
- Instructions to return ONLY a raw JSON array of findings. No markdown,
no explanation, no code fences — just the JSON array. If there are no
findings, return `[]`.
## Phase 2: Collect, deduplicate, and summarize
@@ -290,26 +297,20 @@ jobs:
populate the `resolve` array.
- If `tracking_comment` is non-null, use it as the basis for your summary.
Trust the subagent findings — do NOT re-verify them by running your own
bash, grep, sed, or awk commands against the source code. Phase 2 should
only read `pr-context.json` and then produce the structured output.
Then:
1. Collect all issues. Merge duplicates (same file, lines within 3 of each other, same problem).
2. Drop low-confidence findings.
3. Check the existing inline review comments from `pr-context.json`. Do NOT
include a comment if one already exists on the same file about the same
problem, even if the exact line numbers differ (lines shift between
revisions). Also check for author replies that dismiss or reject a previous
comment — do NOT re-raise an issue the PR author has already responded to
disagreeing with.
Populate the `resolve` array with the REST API `id` (integer) of your own
review comment threads that should be resolved (user.login == "github-actions[bot]"
and body starts with "Claude: "). Do not resolve threads from human reviewers.
A thread should be resolved if:
- The issue it raised has been addressed in the current PR (i.e. your review
no longer flags it), or
- The PR author (or another reviewer) left a reply disagreeing with or
dismissing the comment.
Only include the `id` of the **first** comment in each thread (the one that
started the conversation). Do not resolve threads for issues that are still
present and unaddressed.
1. Collect all issues. Merge duplicates across agents (same file, same
problem, lines within 3 of each other).
2. Drop issues that already have a review comment on the same file about
the same problem, or where the PR author replied disagreeing.
3. Populate the `resolve` array with the `id` of your own review comment
threads (user.login == "github-actions[bot]", body starts with
"Claude: ") that should be resolved — either because the issue was
fixed or because the author dismissed it. Use the first comment `id`
in each thread. Do not resolve threads from human reviewers.
4. Write a `summary` field in markdown for a top-level tracking comment.
**If no existing tracking comment was found (first run):**
@@ -486,7 +487,7 @@ jobs:
...(c.side != null && { side: c.side }),
...(c.start_line != null && { start_line: c.start_line }),
...(c.start_side != null && { start_side: c.start_side }),
body: `Claude: **${c.severity}**: ${c.body}`,
body: c.body,
});
posted++;
} catch (e) {