Replace claude-code-action with a direct claude CLI invocation. This
gives us explicit control over settings, permissions, and output
handling.
Other changes:
- Prepare per-commit git worktrees with pre-generated commit.patch and
commit-message.txt files, replacing the pr-review branch approach.
- Use structured JSON output (--output-format stream-json --json-schema)
instead of having Claude write review-result.json directly.
- Use jq instead of python3 for JSON prettification.
- Add timeout-minutes: 60 to the review job.
- List tool permissions explicitly instead of using a wildcard.
- Fix sandbox filesystem paths to use regular paths instead of the "//"
prefix.
The fix in 8f1751a111 made me wonder if we could automatically detect
when pointers are accessed but when this might not be safe. Systemd
is already using a lot of `assert(dst)` and this change now forces
us to use them.
So this commit (ab)uses coccinelle to flag any pointer parameter
dereference not preceded by assert(param), ASSERT_PTR(param), or an
explicit NULL check. It adds integration into meson as a new "coccinelle"
test suite (just like clang-tidy) and is run in CI. The check is not
perfect but seems a reasonable heuristic.
For this RFC commit it is scoped to a subset, it excludes 25 dirs right
now and includes around 100. About 300 warnings left. Busywork that I am
happy to do if there is agreement that it is worth it.
With this in place we would have caught the bug from 8f1751a111 in CI:
```
FAIL: check-pointer-deref.cocci found issues in systemd/src/boot:
diff -u -p systemd/src/boot/measure.c /tmp/nothing/measure.c
--- systemd/src/boot/measure.c
+++ /tmp/nothing/measure.c
@@ -312,7 +312,6 @@ EFI_STATUS tpm_log_tagged_event(
if (err != EFI_SUCCESS)
return err;
- *ret_measured = true;
return EFI_SUCCESS;
}
```
This also adds a new POINTER_MAY_BE_NULL() for the cases when the
called function will do the NULL check (like `iovec_is_set()`).
When doing large refactors or large changes the bot spams
labels left and right, making the PR unreadable. Use the
new option to limit the bot to a max of 5 file-based
labels. If more than 5 would be set, all file-based labels
are skipped.
Pass side, start_line, and start_side through to createReviewComment()
when present, enabling multi-line review comments. Update the prompt to
document all positioning fields using JSON Schema and make line required.
In https://github.com/systemd/systemd/pull/40980 claude hallucinated
and used "path" instead of "file" as the JSON key. Since "path" is
arguably more correct than "file" anyway, let's switch to that.
After analyzing all 218 CodeQL alerts across the project's history, the
workflow has not justified its CI cost:
- The most impactful query (PotentiallyDangerousFunction) was a custom
systemd-specific query that has already been replaced by clang-tidy's
bugprone-unsafe-functions check (6fb5ec3dd1).
- Of the remaining C++ queries, 6 never triggered at all
(bad-strncpy-size, unsafe-strcat, unsafe-strncat,
suspicious-pointer-scaling, suspicious-pointer-scaling-void,
inconsistent-null-check).
- Several high-value-sounding queries had extreme false positive rates:
toctou-race-condition (95% FP), use-after-free (88% FP),
cleartext-transmission (100% FP).
- Many queries that did trigger are already covered by compiler warnings
(-Wshadow, -Wformat, -Wunused-variable, -Wreturn-type,
-Wtautological-compare) or existing clang-tidy checks
(bugprone-sizeof-expression).
- Across all alerts, only 3 genuinely useful C++ fixes can be
attributed to CodeQL: 1 tainted-format-string, 2
incorrectly-checked-scanf. The rest were either false positives or
incidental fixes during refactoring that weren't prompted by CodeQL.
- The Python queries are largely superseded by ruff (already in CI) and
had an 89% false positive rate on the security-focused checks.
The workflow consumed significant CI resources (40+ minutes per run) and
the ongoing maintenance burden of triaging false positives outweighs the
marginal value of the 2-3 real findings it produced across its entire
lifetime.
I noticed looking at the logs that claude spends a lot of time re-checking
existing comments, so let's update the prompt to hopefully reduce
the amount of comments that it re-checks.
Let's stop using --json-schema and instead have claude write a JSON
file in the repo root which we pass around as an artifact similar to
how we pass around the input. This works around the bug where claude
receives task notifications after producing structured output which
breaks the structured output.
We often have a pattern where the same verb function is used for
multiple actions. This leads to an antipattern where we figure out what
action needs to be taken from argv[0] multiple times: often once in
arse_argv() to figure out what options are allowed, then once again
implicitly in dispatch_verb(), and then again in the action verb itself.
Let's allow passing a parameter into the verb to simplify this.
As it seems impossible to prevent claude from receiving notifications
about subagents finishing after it has produced structured output, which
breaks the structured output as it has to be the final reply, let's stop
using subagents and background tasks completely to avoid the issue.
claude keeps failing by its subagents completing after it has already
written the review for large prs. It seems to run out of turns, tries
to get the subagents to post partial reviews but doesn't seem to stop
them.
Let's insist that it waits for background tasks to stop but let's also
increase the max turns a bit so it doesn't run out as quickly.
claude wants to use python to access the JSON context so let's allow
it. Since python3 basically allows you to reimplement every other tool,
let's just enable all tools except the web related ones but enable network
isolation so it can't try to exfiltrate anything via python.
In CI we get spurious failures about unitialized variables with gcc
versions older then (depending on the case) 12, 13, or 14. Let's only
try to do this check with newer gcc which returns more useful results.
At the same time, do compile with both gcc and clang at -O2, just
disable the warning.
The old logic seems to have been confused. We compile with -Wall, at
least in some cases, which includes -Wmaybe-unitialized. So if we
_don't_ want it, we need to explicitly disable it.
The labelling approach introduced in 6089075265
means contributors can now trigger the workflow on their own when the label
is added by a maintainer and they update the PR. Hence we need to allow all
users to access the claude code action. This is safe because we already gate
the workflow ourselves to only the contributors that we want to allow.
Additionally, the claude code job has no permissions anymore except read access
to the repository and can execute very limited tools, so this should be safe.
The name doesn't actually matter, it gets replaced with the name
of the file when not archiving. So stop passing a name and pass in
the filename as the name when downloading the artifact.
Rather than have claude fetch the context itself, let's fetch the
context for it in the setup job. This has the following advantages:
- We can reduce the permissions granted to the claude job
- claude has less opportunity to mess up trying to fetch the context
itself. Specifically, it keeps spawsning a background task to fetch
the PR branch which messes up the structured output at the end, causing
the review job to fail. By pre-fetching the context it won't have to
spawn the background task. Additionally, we limit the git commands it
can execute to local ones to ensure it doesn't try to fetch the PR branch.
Finally, we fetch the branch ourselves as pr-review so claude can look at it
to review the PR.
- If a pr is labeled with claude-review, review it immediately
- If a pr labeled with claude-review is updated, review it regardless
of the author
- If a pr is opened by a maintainer, review it and add the claude-review
label. If the claude-review label is later removed, the pr won't be
auto-reviewed anymore.
Bash(gh:api *) wasn't actually working. Turns out the colon syntax
is deprecated and unnecessary. Let's stop using it which also fixes
the bug so that gh api calls are allowed now.
Claude now identifies which existing review comment threads should be
resolved (because the issue was addressed or someone disagreed) and
returns their REST API IDs in a new `resolve` array in the structured
output. The post job uses GraphQL to map comment IDs to threads and
resolve them.
Also switches all GitHub data fetching from MCP tools to `gh api` calls,
since the MCP tool strips comment IDs during its GraphQL-to-minimal
conversion and cannot be used for thread resolution.
The thread resolution GraphQL pagination is wrapped in a try/catch so
that a failure to fetch threads degrades gracefully instead of aborting
the entire post job. Unmatched comment IDs are logged for debuggability.
Adds explicit instructions to complete all data fetching before starting
review and to cancel background tasks before returning structured output,
working around a claude-code-action issue where a late-completing
background task triggers a new conversation turn that overwrites the
structured JSON result.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch claude-review from reviewing the entire PR diff at once to
reviewing each commit individually via subagents. Each commit review
subagent receives the PR context, preceding commit diffs, and its own
commit diff, then returns comments tagged with the commit SHA. This
ensures review comments are attached to the correct commit via the
GitHub API rather than all pointing at HEAD.
Also add Bash(gh:*) to allowed tools so subagents can fetch per-commit
diffs via `gh api` without needing local git objects, and remove CI
analysis (needs to be delayed until CI finishes to be useful).
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Address feedback from facebook/bpfilter#472:
- Fix setFailed error message counting file-level comments (without
line numbers) that are intentionally skipped, use inlineComments.length
instead of comments.length
- Fix double severity prefix in inline comments: the prompt told Claude
to prefix body with **must-fix**/etc but the post job also prepended
"Claude: ", producing "Claude: **must-fix**: ...". Now the prompt says
not to prefix and the post job adds "Claude **severity**: " using the
structured severity field
- Move error tracking instructions to a top-level section after all phases
so they apply to all runs, not just the first run
- Clarify that line is optional: use "should be" instead of "must be"
and document that omitting line still surfaces the comment in the
tracking comment summary
- Distinguish cancelled vs failed in tracking comment message
- Add side: "RIGHT" and subject_type: "line" to createReviewComment
per GitHub API recommendations
- Downgrade partial inline comment posting failures to warnings; only
fail the job when no comments at all could be posted
Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
- Use github.paginate() for listComments to handle PRs with 100+ comments
- Make line optional in review schema to allow file-level comments
- Skip createReviewComment for comments without a line number
- Fix failed count to exclude skipped file-level comments
- Pass review result via env var instead of expression injection
- Use core.warning() instead of console.log() for JSON parse failures
- Fix MARKER insertion for single-line summaries that have no newline
- Require "@claude review" instead of just "@claude" to trigger
Co-developed-by: Claude <claude@anthropic.com>