Skip to content

flue: add generic code review alongside style-guide review#31562

Open
mvvmm wants to merge 30 commits into
productionfrom
code-review
Open

flue: add generic code review alongside style-guide review#31562
mvvmm wants to merge 30 commits into
productionfrom
code-review

Conversation

@mvvmm

@mvvmm mvvmm commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Adds a generic engineering code review to the Flue bot. Both the code review and existing style-guide review now post to a single bot comment split into ### Code Review and ### Style Guide Review sections.

Architecture

Both reviewers run as separate specialist workflows (code-review-specialist, style-guide-specialist), each in their own Durable Object isolate with its own 128 MB memory budget. The orchestrator admits both and polls them concurrently; one failing degrades to an empty section rather than aborting the whole review.

Code review specialist

  • Reviews all changed text files (excluding lockfiles, generated output, binary/image files), capped at 20 files (largest-diff-first).
  • Trusted TypeScript pre-stages the data before any model call: added lines are parsed from the diff patch (accurate new-file line numbers), and the full file content is fetched at the head SHA. Both are passed directly as args.addedLines and args.fileContent, eliminating the mandatory setup turns the model previously spent reading manifests and fetching files.
  • read_repo_file (defaults to PR head SHA) and search_repo remain available as optional tools for cross-file lookups (callers, import sites, etc.).
  • The root AGENTS.md is fetched from the PR base ref and injected as agent instructions so the reviewer has repo conventions without a checkout. (Base ref, not head, to prevent a PR from poisoning the reviewer's instructions.)
  • Findings use a critical / warning / suggestion severity hierarchy with CR- IDs.

Style guide specialist

  • Reviews only src/content/(docs|partials|changelog)/**.mdx, capped at 20 files.
  • Reads the diff and full file content from the staged workspace via the code tool.
  • warning / suggestion severities only.

Memory and timeout

  • Each per-file agent session is deleted immediately after it finishes (session.delete()), bounding peak heap to ~concurrency live sessions rather than growing with the file count.
  • Per-file hard timeout (10 min, env-overridable via CODE_REVIEW_FILE_TIMEOUT_MS / STYLE_GUIDE_FILE_TIMEOUT_MS): on timeout, the CallHandle is aborted first (settling the operation) and then the session is deleted, preventing zombie session leaks.
  • Concurrency is env-overridable (CODE_REVIEW_CONCURRENCY, STYLE_GUIDE_CONCURRENCY) so local dev can run lower values without touching prod defaults.

Other

  • Reconciler handles each stream (code, style) independently against its prior findings + human comments; persists { code, style } to R2 (legacy bare-array shape tolerated).
  • Incremental diff mode (SHA-pinned from last reviewed head to current head) when a prior review exists; full diff otherwise. Specialists self-heal incremental → full if the base SHA is gone.
  • Cadence unchanged: automatic on open/synchronize/ready_for_review (up to 2-review cap), plus /review, /full-review, and /ignore-review-limit codeowner commands.
  • Beta disclaimer rendered under the Code Review heading.

Validated

tsc --noEmit, flue build, prettier. Verified end-to-end in log mode and comment mode against real PRs.

Adds a bonk-style generic engineering review that runs in the same
orchestrator run as the existing style-guide review and posts findings
in the same bot comment, now split into "### Code Review" and
"### Style Guide Review" sections.

- New code-review skill: reviews all changed files (including code
  examples in MDX) for bugs, error handling, security, dead code, and
  maintainability. No style/prose checks; defers CI-caught issues.
- New in-process fan-out (lib/code-review-inproc.ts) mirroring the
  style-guide one: per-file sessions, concurrency 5, cap 20 files.
- Code findings add a `critical` severity above warning/suggestion and
  use the CR- id namespace; reconcile skill preserves it.
- Code-review agent gets full-file context via GitHub-API-backed tools
  (read_repo_file defaults to the PR head SHA, search_repo).
- The repo root AGENTS.md is fetched at the head SHA and injected as
  agent instructions every run, since the Worker has no repo checkout
  to discover it from.
- Orchestrator runs both reviews concurrently (one failing degrades to
  empty rather than aborting), reconciles per stream, persists
  { code, style } findings to R2 (legacy bare-array tolerated), and
  renders one unified comment.

Cadence is unchanged: this rides the existing code-review-orchestrator
triggers (auto on PRs, /review, /full-review).
@mvvmm mvvmm requested review from a team and kodster28 as code owners June 18, 2026 16:50
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

This pull request requires reviews from CODEOWNERS as it changes files that match the following patterns:

Pattern Owners
* @cloudflare/product-owners
*.ts @cloudflare/content-engineering, @kodster28
package.json @cloudflare/content-engineering

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

mvvmm added 5 commits June 18, 2026 12:57
Fixes surfaced by the bot reviewing its own PR (#31562):

- getRepoFileContent: decode base64 as UTF-8 via TextDecoder (atob alone
  produced mojibake for non-ASCII content such as em dashes in AGENTS.md);
  URL-encode path segments; return null only on 404 and throw on other
  non-2xx so "absent" is distinct from "failed to load".
- Fetch the root AGENTS.md from the PR base ref instead of the head SHA.
  The content is injected into agent instructions, so reading it from the
  head let a PR poison AGENTS.md to inject instructions into the reviewer.
- Render: a degraded (failed) review section now shows a failure note
  instead of "No issues found", and the status line notes the partial
  failure; carried-forward findings still render. Orchestrator passes the
  per-stream failure flags into renderComment.
- Render: match acknowledged findings by stable id, not path:line:rule, so
  two findings sharing those fields are not both dropped.
- github-repo-tools.ts: update stale file header (now shared by the
  Dependabot and code-review agents).
Running the code-review and style-guide fan-outs concurrently kept ~10
model sessions live in a single Durable Object isolate and over-ran its
128 MB limit, producing repeated "isolate exceeded its memory limit and
was reset" errors (and downstream event-stream append failures).

- Run the two fan-outs sequentially instead of via Promise.all, so one
  fan-out's sessions are reclaimed before the next starts (roughly halves
  peak heap). Per-stream failure isolation is preserved.
- Lower code-review concurrency from 5 to 3, since its sessions are
  heavier (full-file reads + injected AGENTS.md) than style-guide ones.

Trade-off: review wall-clock is now the sum of the two fan-outs rather
than the max — acceptable for a background review bot.
Move the code-review and style-guide fan-outs out of the orchestrator DO
into their own discovered workflows, so each runs in its own Durable
Object isolate (own 128 MB budget). This removes the shared-isolate
memory contention that caused "isolate exceeded its memory limit and was
reset" errors and restores parallel execution.

- New workflows code-review-specialist + style-guide-specialist
  (FlueCodeReviewSpecialistWorkflow / FlueStyleGuideSpecialistWorkflow).
  Each self-fetches its own diff once (no per-file fetch, no R2 diff
  staging), stages it into its own workspace, runs the existing in-process
  fan-out (reused verbatim), and returns findings as its run result.
- Orchestrator now decides diff mode only (no file fetch), admits both
  specialists via admitWorkflow and polls them concurrently via pollRun
  (the spam-filter pattern). A timed-out/errored specialist degrades to an
  empty section with its failure flag set; both failing -> failure comment.
- Specialists self-heal incremental -> full diff if the base SHA is gone.
- Code-review concurrency restored to 5 (own isolate now).
- New shared lib/review-specialist.ts: payload contract + helpers.
- writeDiffToWorkspace accepts a minimal DiffPullRequest shape.
- wrangler v5 migration adds the two new SQLite DO classes.

Supersedes the sequential-fan-out workaround; the fan-outs no longer run
in the orchestrator DO at all.
@cloudflare cloudflare deleted a comment from cloudflare-docs-bot Bot Jun 18, 2026
@cloudflare cloudflare deleted a comment from cloudflare-docs-bot Bot Jun 18, 2026
mvvmm added 8 commits June 18, 2026 14:35
The code-review specialist timed out on large PRs. AI Gateway showed all
model calls succeeding (0 failures) but slow (p90 ~47s, p99 ~103s), so the
cause was DO memory pressure plus total wall-clock, not model errors:

- A 13-file run at concurrency 5 hard-OOM'd the specialist isolate
  ("SQL query failed: ... memory limit") and never reached run_end.
- Even without OOM, 13 multi-turn file sessions exceeded the orchestrator's
  10-minute poll deadline.

Fixes:
- CODE_REVIEW_MAX_FILES 20 -> 10. Each code-review file is a multi-turn
  agent session, so this bounds both peak isolate memory and wall-clock.
  Covers virtually all docs PRs; larger PRs review their 10 largest files.
- Orchestrator specialist poll timeout 10m -> 20m, since a legitimate large
  review is many slow model calls and the specialist runs durably in its
  own DO.
- Concurrency stays at 3 (memory).
Delete each per-file agent session as soon as it finishes so harness
memory is bounded to ~concurrency live sessions instead of growing with
the changed-file count. With the leak fixed, restore the code-review cap
to 20 files and concurrency to 5, and drop the dead reserveTokens hint.
Pre-stage added lines + file content into skill args (code-review-inproc.ts,
code-review-specialist.ts, SKILL.md): trusted TypeScript now parses the diff
and fetches the full file at head SHA before each session opens, eliminating
the mandatory setup turns (read manifest → read patch → parse lines →
read_repo_file). Cross-file tools remain available for optional lookups.
Removes diff workspace staging from the code-review path entirely.

Add external cron watchdog (review-watchdog.ts, review-watchdog-decide.ts,
run-state.ts): a scheduled handler fires every 2 minutes, detects reviews
stuck past a deadline via inflight R2 markers, and re-drives them. Workflows
are non-resumable after DO interruptions; the watchdog is the only reliable
recovery path. Inflight markers are written at review start and cleared on
completion (code-review-state.ts, code-review-orchestrator.ts).

Per-file hard timeout now aborts the CallHandle before deleting the session
(code-review-inproc.ts, style-guide-inproc.ts): session.delete() rejects
while an operation is active, so abort must settle the operation first.
Fixes zombie session leak on timeout.

Env-overridable concurrency and file timeout (lib/env.ts): both specialists
read CODE_REVIEW_CONCURRENCY, CODE_REVIEW_FILE_TIMEOUT_MS,
STYLE_GUIDE_CONCURRENCY, STYLE_GUIDE_FILE_TIMEOUT_MS so local dev can run
lower values without touching prod defaults.

Staged-diff cleanup in style-guide-specialist.ts: removeWorkspacePath in
try/finally so the run-scoped SQLite rows are freed after each review.
Added removeWorkspacePath to connectors/cloudflare-shell.ts.

Wrangler: cron trigger every 2 min, DOCS_FLUE_BASE_URL var for watchdog.
Tooling: flue:reset:local script, NODE_OPTIONS on flue:dev, fixed
clear-r2-pr-data.ts to cover both state paths and inflight/ prefix.
The scheduled watchdog was never functional in prod: Flue drops non-HTTP
handlers from app.ts (they require cloudflare.ts), so the cron fired every
2 minutes and logged "Handler does not export a scheduled() function" 30
times per hour without ever invoking runReviewWatchdog().

Reviews stopped getting stuck independently — the skill rewrite + per-file
session.delete() + concurrency limits reduced orchestrator interruptions to
near-zero. The watchdog's run-state detection and re-admission path were
also untested end-to-end, so enabling it carried real risk.

Removes: review-watchdog.ts, review-watchdog-decide.ts, run-state.ts, all
inflight marker API from code-review-state.ts, all marker call sites from
code-review-orchestrator.ts, triggers.crons + vars.DOCS_FLUE_BASE_URL from
wrangler.jsonc. Reverts app.ts to canonical export default app.

Keeps: markAutoReviewCompleted, renderFailureComment, bypassReviewLimit
(all used independently). bin/clear-r2-pr-data.ts inflight/ handling kept
for one-time cleanup of orphaned prod R2 markers.
An absent triggers key leaves existing cron triggers intact on deploy;
triggers.crons=[] forces wrangler to PUT an empty schedule list, actually
deregistering the leftover */2 cron.
1. Extract withConcurrency to lib/inproc-utils.ts; import from both
   code-review-inproc.ts and style-guide-inproc.ts instead of duplicating.

2. Fix read_repo_file tool UTF-8 decoding: use TextDecoder instead of bare
   atob() so non-ASCII content (em dashes, CJK, smart quotes) is not
   mojibake. Also encode path segments with encodeURIComponent, matching
   the getRepoFileContent fix already applied to github.ts.

3. Fix parseAddedLines: skip lines starting with backslash ('\ No newline
   at end of file') instead of treating them as context lines. Previously
   this incremented newLine, producing off-by-one line numbers for every
   subsequent added line in files without a trailing newline.

4. Fix stale comments: code-review-diff.ts, style-guide-inproc.ts, and
   wrangler.jsonc v4 comment still described the old in-orchestrator-DO
   architecture; updated to reflect the specialist-DO split.
@cloudflare cloudflare deleted a comment from cloudflare-docs-bot Bot Jun 26, 2026
@cloudflare cloudflare deleted a comment from ask-bonk Bot Jun 26, 2026
@cloudflare cloudflare deleted a comment from ask-bonk Bot Jun 26, 2026
…d-mode commands

Routes code review to one of two strategies based on combined diff size:

Fan-out (≤50 KB): per-file sessions, concurrency 5, cap 20 files. Good for
smaller PRs; maximum per-file detail, parallel sessions, per-file degradation.

Holistic (>50 KB): single session over the entire PR diff in one pass, cap 50
files (lib/code-review-holistic.ts). Enables cross-file reasoning, no per-file
setup overhead, faster and more reliable on large PRs where fan-out wedges.
New code-review-holistic skill: rich substance from the per-file skill, reframed
for whole-PR review, model-owned navigation.

Comment heading reflects which ran: 'Holistic Code Review' or
'Fan Out Code Review'. Both paths log inputTokens/totalTokens for calibration.

Two new codeowner slash commands:
  /fan-out-review  — forces fan-out regardless of diff size (with warning in
                     the comment that it may be slow/fail on large PRs)
  /holistic-review — forces holistic regardless of diff size

forceReviewMode threads through orchestrate → orchestrator → review-specialist
→ code-review-specialist, overriding size-based routing when set.

Also: selectCodeReviewFiles takes an optional maxFiles param (holistic passes
50, fan-out uses the existing 20 default); per-file usage logging (inputTokens,
totalTokens) added to reviewSingleFile; progress logs committed from the
previous session also included here.
@cloudflare cloudflare deleted a comment from cloudflare-docs-bot Bot Jun 26, 2026
@cloudflare cloudflare deleted a comment from cloudflare-docs-bot Bot Jun 26, 2026
mvvmm added 4 commits June 26, 2026 14:07
…; fix partial run consuming auto-review slot

/fan-out-review and /holistic-review now check getPullRequest().user.login
for dependabot[bot] before dispatching, matching the /review and /full-review
pattern. Dependabot PRs route to /workflows/dependabot-review without
forceReviewMode; regular PRs keep the forced mode.

markAutoReviewCompleted now only fires when both codeOutcome.ok and
styleOutcome.ok are true. Previously a partial run (one specialist degraded)
still consumed an auto-review slot, which blocked the retry the comment
promised would happen on the next push.
… workflow

Remove the 20-minute specialist poll from the code-review orchestrator
(root cause of stuck reviews when the orchestrator DO was interrupted
mid-wait). Replace with an event-driven finalize pattern:

- Orchestrator is now dispatch-only: limit check, placeholder, diffMode,
  write context.json to R2, admit both specialists fire-and-forget, return.
- Each specialist wraps its review in try/catch (degraded result on error),
  writes its stream result to R2, then races for a conditional-PUT finalize
  lock (If-None-Match: *). Exactly one specialist wins.
- New finalize-review workflow (FlueFinalizeReviewWorkflow, v6 migration):
  reads context + stream results from R2, head-guards against newer pushes,
  idempotency-guards against double-finalize, reconciles both streams,
  persists review JSON, renders and posts the comment, swaps reactions,
  marks auto-review slot consumed, cleans up the pending namespace.
- reviewMode is carried through context.json so finalize always uses the
  same mode the orchestrator decided on (fixes log-vs-comment split in
  local dev where DO env views can differ).
- dispatchId scoping isolates concurrent same-head dispatches (e.g.
  auto-review + /full-review arriving simultaneously).
- Zero new custom Durable Object bindings beyond the standard Flue
  workflow class (FlueFinalizeReviewWorkflow in wrangler.jsonc v6).
…lize

A specialist DO that is hard-evicted mid-review never reaches the
writeStreamResult call at the end, leaving no code.json/style.json for
the other specialist to detect and claim the finalize lock — exactly
the stuck-review scenario seen on PR #31736.

Fix: each specialist now writes { ok: false, degradedResult } to R2
BEFORE starting its review. If the DO is evicted, the placeholder is
already there. The placeholder is overwritten with the real result on
success. Either way finalize always has two stream results and will run,
posting a degraded-section comment rather than staying stuck forever.

Also fixes the unused FinalizeContext import lint error.
@cloudflare cloudflare deleted a comment from cloudflare-docs-bot Bot Jun 26, 2026
The placeholder writes added in the previous commit caused a race: both
specialists write their degraded placeholders within ~1-2s, the first to
write checks the sibling's placeholder (which looks like a valid result),
claims the lock, and finalize fires immediately with two ok:false results
before either review ran — posting an instant failure comment.

Fix:
- Add final:boolean to StreamResultPayload (false=placeholder, true=real)
- tryClaimFinalize now checks final:true on the sibling — a placeholder
  (final:false) is not treated as 'done', preventing premature triggering
- Orchestrator writes both crash-protection placeholders (final:false)
  BEFORE admitting specialists so a key exists even if a specialist DO
  is evicted immediately after admission
- Specialists write final:true in their rendezvous tail (removed the
  redundant per-specialist placeholder writes added previously)
@cloudflare cloudflare deleted a comment from cloudflare-docs-bot Bot Jun 26, 2026
…refactors

A. Robustness (stuck-review fixes):
- Wrap all specialist setup (token, diff fetch) + review in one try/catch;
  any setup failure now writes a final:true degraded result and triggers
  finalize rather than throwing and leaving the placeholder stuck at final:false.
  Extract shared reportSpecialistResult() helper into lib/finalize-rendezvous.ts
  to eliminate the duplicated rendezvous tail across both specialists.
- Orchestrator admit failures overwrite the placeholder with final:true degraded;
  if both fail, orchestrator itself claims the finalize lock and admits finalize.
  Drop redundant type casts on the narrowed admit union.
- finalize-review post-comment failure now returns {finalized:false} and skips
  markAutoReviewCompleted and cleanupPending so the next push can retry without
  burning an auto-review slot on an undelivered comment.
- Degraded streams (ok:false) skip reconciliation and carry previous findings
  forward as active, preventing false resolution of prior issues.

B. Defensive R2 / JSON:
- Wrap obj.json() in try/catch in readContext, readStreamResult, and
  tryClaimFinalize; treat corrupted/partial writes as null/not-final.

C. Safety:
- Guard req in orchestrator with an explicit throw instead of req!.
- safeOrigin() helper returns '' on URL parse failure in both specialists.
- parseReviewSpecialistPayload validates pr field shapes and DiffMode union
  (incremental requires fromSha+toSha strings); normalizes baseUrl to a
  valid absolute http(s) origin via new URL().origin, dropping invalid values.

D. Correctness:
- Holistic reviewedFiles now built from patched files only, matching the diff
  actually sent to the model.
- clearTimeout called immediately after await handle resolves in both holistic
  and per-file reviewSingleFile, preventing a late timer from mislabeling a
  post-resolve error as a timeout.
- Finding ID hash widened from 6 to 12 hex chars (24→48 bit) in both CR-/SG-
  namespaces; line number deliberately kept excluded for cross-commit stability.
- withConcurrency clamps limit to Math.max(1, Math.floor(limit)).

E. Dev script (bin/clear-r2-pr-data.ts):
- WHERE clause tightened to 'diffs/pr-%' and 'inflight/pr-%' (was 'diffs/%').
- Switch execSync string interpolation to execFileSync with argument arrays.
- Freshen stale header comment.

F. Refactors:
- Collapse four near-identical /review /full-review /fan-out-review
  /holistic-review handlers in orchestrate.ts into one shared helper.
- Merge duplicate cloudflare-shell imports in style-guide-specialist.ts.

G. Docs:
- AGENTS.md: add finalize-review row to workflow table; rewrite request flow
  to describe dispatch-only orchestrator, R2 rendezvous, final:bool flag, and
  finalize-review steps; add v6 migration; update slash-command list.
@cloudflare cloudflare deleted a comment from cloudflare-docs-bot Bot Jun 26, 2026
mvvmm added 4 commits June 26, 2026 18:27
…th spiral

The FlueCodeReviewSpecialistWorkflow and FlueStyleGuideSpecialistWorkflow
DOs accumulated SQLite state from previous crashed runs. Each OOM leaves
orphaned session data in the DO's storage. After several crashes the
accumulated state is large enough that even the Flue alarm recovery handler
OOMs on startup (~900ms, repeating every ~30-60s), creating a death spiral
where no future review can complete.

Fix: two-step migration to reset the storage:
  v7 — rename bloated instances to *Old (abandons their storage)
  v8 — recreate the classes with empty storage under the original names

The wrangler bindings still point to the original names (FlueCodeReview-
SpecialistWorkflow / FlueStyleGuideSpecialistWorkflow), so no code changes
are needed. Run history prior to v7 is lost.
… rendezvous, style-guide rule additions

- Generalise finalize rendezvous from 2 streams to N streams (EXPECTED_STREAMS
  constant; tryClaimFinalize loops all siblings in parallel; stream type widened
  to string; expectedStreams threaded through context, payload, and options)
- Add conventions specialist (light AI session, conventions-check skill):
  checks PR title format, Summary content against fetched template, and redirect
  checklist when docs files are renamed/deleted
- Add redirect specialist (pure TypeScript, no model): derives old+new URLs from
  file paths, skips format-only moves (foo.mdx → foo/index.mdx), splat-aware
  __redirects matching, degrades gracefully if file unreadable
- Redirects reconcile deterministically in finalize (no LLM); conventions force
  full diff mode; bothFailed gate remains code+style only
- Add previous_filename to PullRequestFile for correct rename handling
- Add marketing-language and time-sensitive-content suggestions to style-guide
  core-content reference; add component should-use suggestions to code-blocks
- Render: 4 sections (Code → Conventions → Style → Redirects); pr path renders
  as PR label; no-findings messages use <sub> (smaller, no emoji); top-level
  status line unchanged
- wrangler.jsonc v7: new_sqlite_classes for both new DO classes
- Fix idempotency guard to skip in log mode (guard is comment-mode only)
Codeowners can comment /disable-auto-review on a PR to stop push-triggered
reviews from running. The flag is stored in R2 at
diffs/pr-<n>/auto-review-disabled.json (same pattern as ignore-review-limit).

Manual /review and /full-review commands (bypassReviewLimit=true) still work
normally. The command is acknowledged with a thumbs-up reaction.
@cloudflare cloudflare deleted a comment from cloudflare-docs-bot Bot Jun 27, 2026
mvvmm added 2 commits June 27, 2026 14:32
Holistic mode timed out on large PRs (>50 KB diff, single model pass,
15-min hard timeout) producing zero findings. Fan-out is strictly better:
bounded per-file sessions, graceful per-file degradation, guaranteed
completion on any diff size.

- Delete lib/code-review-holistic.ts and .agents/skills/code-review-holistic/
- code-review-specialist.ts: always fan-out; remove routing logic, holistic
  imports, env reads, forceReviewMode handling
- lib/code-review-inproc.ts: remove CODE_REVIEW_HOLISTIC_* constants and
  re-export; remove reviewMode from both return sites
- lib/code-review-results.ts: remove reviewMode field from CodeReviewResult
- lib/review-specialist.ts: remove forceReviewMode from payload type + parser
- code-review-orchestrator.ts: remove forceReviewMode from payload + specialistBody
- orchestrate.ts: remove /fan-out-review and /holistic-review commands
- code-review-render.ts: heading always 'Code Review'; remove codeMode from
  RenderReviewInput; remove /fan-out-review and /holistic-review command rows
- finalize-review.ts: remove codeMode from renderComment call
- AGENTS.md: remove holistic routing description and command bullets
@cloudflare cloudflare deleted a comment from cloudflare-docs-bot Bot Jun 27, 2026
@cloudflare cloudflare deleted a comment from cloudflare-docs-bot Bot Jun 27, 2026
@mvvmm

mvvmm commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

/full-review

@cloudflare-docs-bot

cloudflare-docs-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Review

Reviewing new changes (commit df8c587)…


⚠️ 23 warnings, 💡 5 suggestions found in commit 86b473d.

Code Review

This code review is in beta and may not always be helpful — use your judgment.

Warnings (22)
File Issue
.flue/workflows/finalize-review.ts line 581 Incomplete payload validation — parsePayload casts payload to Partial and then reads .eventType. If payload is null, undefined, or a non-object, this throws a TypeError before the intended validation Error. It also accepts NaN because typeof NaN === 'number'. Fix: Guard that payload is a non-null object, then validate fields and ensure number is a finite positive integer.
.flue/workflows/finalize-review.ts line 186 Unhandled GitHub API failure — await getPullRequest(token, input.number) is unwrapped. A network or GitHub error aborts the workflow without running cleanupPending or consuming the auto-review slot, leaving a stale pending namespace. Fix: Wrap the call in try/catch, log the failure, run cleanupPending, and return { finalized: false, reason: 'pr_fetch_failed' }.
.flue/workflows/finalize-review.ts line 206 Unnecessary GitHub API call in log mode — getIssueComments is fetched unconditionally, but botComment is only used when reviewMode === 'comment'. In log mode this adds a needless GitHub dependency and can fail finalize when no comment would be posted. Fix: Move the getIssueComments call and botComment lookup inside the if (reviewMode === 'comment') block.
.flue/lib/code-review-inproc.ts line 67 Diff parser drops added lines starting with ++if (raw.startsWith("+++") || raw.startsWith("---")) continue; runs before the added-line check. An added source line like ++i; appears in the patch as +++i;, which starts with +++ and is skipped entirely, so it never reaches args.addedLines. Fix: Match git file headers specifically, e.g. if (raw.startsWith("+++ ") || raw.startsWith("--- ")) continue;, so only +++ b/path/--- a/path metadata lines are skipped.
.flue/lib/code-review-inproc.ts line 162 Duplicate IDs can silently drop findingsfindingsById.set(finding.id, finding) deduplicates by ID, but assignCodeReviewFindingIds hashes rule:path:evidence and deliberately excludes line. Two distinct findings in the same file on different lines with the same rule and evidence text therefore receive the same ID, and the Map keeps only the last one. Fix: Include the line number in the finding ID hash, or append a disambiguating counter to duplicate IDs while keeping a separate stable group key for reconciliation.
.flue/lib/finalize-rendezvous.ts line 371 Workflow admission failure leaves dispatch stuck — In reportSpecialistResult, after tryClaimFinalize wins the conditional lock, admitWorkflow is awaited inside the try block. If admitWorkflow throws (network error, 5xx, missing runId), the catch swallows it and returns. The finalize.lock already exists, so no other specialist can claim it later, and finalize-review is never admitted for this dispatch. Fix: Move the admitWorkflow call (and finalization log) outside the catch-all try block so admission errors propagate to the caller and can be retried, or add targeted retries before giving up on the lock.
.flue/workflows/redirect-specialist.ts line 317 Unhandled rendezvous failureawait reportSpecialistResult(...) is outside the try/catch block, so a failure while writing the result or claiming the finalize lock rejects from run instead of returning the degraded result that the catch block prepared. Fix: Wrap the reportSpecialistResult call in its own try/catch, log the finalization failure, and still return result so the orchestrator receives the degraded state.
.flue/workflows/redirect-specialist.ts line 269 Malformed redirect suggestion for cross-area renames — For a renamed docs page whose new path is outside src/content/docs/, pathToUrl(newPath) returns "" and the suggestion template produces an invalid \/old/ 301`line with an empty destination. Fix: Use a placeholder destination such as/new-destination/whenpathToUrl(newPath)` is empty, instead of interpolating an empty string into the redirect line.
.flue/workflows/code-review-orchestrator.ts line 387 Swallowed R2 write can deadlock finalize rendezvous — After a specialist admit fails, the orchestrator overwrites the stream placeholder with final: true but appends .catch(() => {}), swallowing any R2 write failure. If the write fails, the stream stays final: false, so tryClaimFinalize can never succeed and the review finalize workflow never runs. Fix: Remove the .catch(() => {}) so the orchestrator fails visibly (and can be retried), or retry the write a bounded number of times before surfacing the error.
.flue/workflows/code-review-orchestrator.ts line 196 req validation after GitHub state mutation leaves stale comment — When reviewMode === "comment", the orchestrator posts or updates the PR comment at lines 179-192 before checking if (!req) at line 196. If req is missing, the function throws after leaving a "review pending" comment that will not be finalized by any specialist workflow. Fix: Validate req and derive baseUrl before posting the placeholder comment so a missing request context never mutates GitHub state.
.flue/lib/code-review-render.ts line 240 Critical findings silently dropped outside code review — renderSection uses active.length (which includes critical findings) to decide whether to render content, but then only renders the critical table when includeCritical is true (line 248). For style/conventions/redirects, includeCritical is false, so a critical finding in those sections would cause the section heading to be emitted with no tables, and renderComment only counts critical findings from the code section (line 270), so the top-level status would also omit them. Fix: Either always render the critical table when critical.length > 0, or filter critical findings out of active when includeCritical is false so the section correctly shows "no issues" and the counts stay consistent.
.flue/workflows/code-review-specialist.ts line 182 Unhandled rejection in finalization — The await reportSpecialistResult() call at lines 182-195 is outside the try/catch block that wraps the review logic, so an R2, network, or finalization error causes the entire workflow to reject instead of returning the degraded result that was carefully set up. Fix: Move the rendezvous call inside the try block, or wrap it in its own try/catch that logs the error and continues, ensuring the degraded result is always returned as the function's output.
.flue/lib/review-specialist.ts line 107 Incomplete PR metadata validation — parseReviewSpecialistPayload validates pr.number, pr.title, pr.base, pr.head, and pr.labels, but ReviewSpecialistPrMeta also requires pr.body and pr.author to be strings. A malformed payload can pass validation with missing or non-string values. Fix: Add typeof checks for pr.body and pr.author, or normalize missing values to empty strings.
.flue/lib/review-specialist.ts line 160 Unchecked expectedStreams element types — expectedStreams is conditionally returned as string[] via an unchecked cast of input.expectedStreams. If array elements are not strings, downstream rendezvous logic may fail or behave unexpectedly. Fix: Validate every element with typeof x === "string" before returning the array.
.flue/lib/style-guide-inproc.ts line 294 Race condition in timeout detection — The setTimeout callback sets timedOut = true before calling handle.abort(). If a non-timeout error occurs just before the timer fires, the callback can still run and set the flag before the rejected await handle reaches the catch block, causing a genuine failure to be reported and rethrown as a timeout. Fix: Determine timeout from the abort/error itself (for example, only set timedOut if handle.abort() actually aborts a pending operation, or by inspecting the error type) instead of relying on a shared boolean flag that can be set after the operation has already failed.
.flue/AGENTS.md line 7 Architecture omits half the review pipeline — Line 7 claims the bot runs 'two independent reviews ... Code Review and ... Style Guide Review'. Lines 18-26 list only those two specialist workflows. workflows/code-review-orchestrator.ts admits four specialists (/workflows/code-review-specialist, /workflows/style-guide-specialist, /workflows/conventions-specialist, /workflows/redirect-specialist), finalize-review.ts reconciles all four streams, and lib/code-review-render.ts emits four sections (Code Review, Conventions, Style Guide Review, Redirects). Line 83 also omits migration v7, which added FlueConventionsSpecialistWorkflow and FlueRedirectSpecialistWorkflow per wrangler.jsonc. Fix: Update the architecture description, workflow table, request flow, R2 state shape ({ code, style, conventions, redirects }), rendered sections, and migration history to include conventions-specialist and redirect-specialist.
.flue/workflows/orchestrate.ts line 222 Unhandled promise rejection — addReactionToComment(..., 'eyes') is awaited without error handling in the /review and /full-review handler. If the reaction request fails, the entire slash command aborts and no review is dispatched. Fix: Wrap the reaction call in a try/catch or add a .catch so reaction failures do not block review dispatch.
.flue/workflows/orchestrate.ts line 311 State change before acknowledgment — setReviewLimitIgnored persists state before the +1 reaction is added. If addReactionToComment rejects, the review-limit flag remains set but the webhook returns a failure and may be retried. Fix: Either add the reaction first or catch reaction failures after state is persisted to avoid spurious webhook retries.
.flue/workflows/orchestrate.ts line 356 State change before acknowledgment — setAutoReviewDisabled persists state before the +1 reaction is added. If addReactionToComment rejects, auto-review remains disabled but the webhook returns a failure and may be retried. Fix: Either add the reaction first or catch reaction failures after state is persisted to avoid spurious webhook retries.
.flue/bin/clear-r2-pr-data.ts line 74 Unhandled external command failure — The sqlite3 calls via execFileSync (count query and DELETE) have no error handling. If sqlite3 is not installed, the database is locked, or the _mf_objects table does not exist, the thrown exception crashes the script. Fix: Wrap the execFileSync calls in try/catch and emit a clear, actionable error message (e.g., indicate sqlite3 is missing or the DB is locked).
.flue/lib/code-review-results.ts line 52 Non-unique finding IDs for distinct findings — The ID hash input is only ${f.rule}:${f.path}:${f.evidence.trim()} (line 52), excluding the line number. Two distinct findings in the same file that share the same rule and evidence text will therefore produce the same CR-${hex.slice(0, 12)} ID (line 57), breaking the expectation that each finding has a unique ID. Fix: Include the line number in the hash input, or check for duplicate IDs after assignment and append a counter to duplicates, so distinct findings remain distinguishable while keeping IDs stable.
.flue/lib/code-review-state.ts line 125 Race condition in read-modify-write — markAutoReviewCompleted reads the counter from R2, mutates the count and shas array in memory, and then writes the whole object back with an unconditional bucket.put. Two concurrent review runs for the same PR can both read the same state, both pass the shas.includes(headSha) check, and both write back, causing the second put to overwrite the first. This can double-increment the count and/or lose a recorded SHA. Fix: Use a conditional R2 put (e.g., if-match ETag returned by bucket.get) and retry on conflict, or serialize updates through a Durable Object to make the read-modify-write atomic.
Suggestions (5)
File Issue
.flue/lib/finalize-rendezvous.ts line 257 R2 list pagination not handled — cleanupPending calls bucket.list({ prefix }) once and deletes only listed.objects. R2 list responses are paginated; if a prefix ever contains more than the default page size, truncated results and the cursor are ignored, leaving keys behind despite the comment claiming it deletes every key under the prefix. Fix: Loop while listed.truncated is true, passing listed.cursor to subsequent bucket.list calls, and collect/delete all objects across pages.
.flue/workflows/conventions-specialist.ts line 115 Misleading variable name and duplicated path logic — 'renamedDocFiles' is built from both 'renamed' and 'removed' files, and the same previous_filename/filename branch is repeated in the .filter() predicate and the .map() transform. Fix: Rename the variable (and skill argument) to something like 'docsFilesNeedingRedirect' and compute the old path once, e.g. via map+filter or a helper, to avoid duplicating the branching logic.
.flue/AGENTS.md line 73 Slash command missing from docs — The slash commands section lists /review, /full-review, and /ignore-review-limit, and says 'All commands swap 👀 → 👍'. workflows/orchestrate.ts also handles /disable-auto-review, and lib/code-review-render.ts includes it in the rendered Commands block. Fix: Add /disable-auto-review to the slash commands list with a brief description.
.flue/workflows/orchestrate.ts line 283 Duplicated slash-command handler — The /ignore-review-limit and /disable-auto-review handlers duplicate comment id validation, sender validation, token setup, codeowner check, R2 bucket access, +1 reaction, logging, and return shape. Fix: Extract a shared helper that takes the command name and a state-mutator callback to reduce duplication and future drift.
.flue/bin/clear-r2-pr-data.ts line 42 Directory type not verifiedexistsSync(stateDir) only checks that the path exists, not that it is a directory. If a candidate path exists as a file, the subsequent readdirSync(stateDir) call will throw ENOTDIR. Fix: Use statSync(stateDir).isDirectory() instead of existsSync(stateDir).

Conventions

Checks PR title, description, and redirect checklist.

Warnings (1)
File Issue
PR Summary section content — The PR description does not contain a ### Summary section; the first heading is ## Architecture. Fix: Add a ### Summary section to the description with context about the purpose of these changes.

Style Guide Review

No style-guide issues found.

Redirects

No missing redirect entries found.

Commands

Only codeowners can run commands. Post a comment with the command to trigger it.

Command Description
/review Runs a review now. Incremental if a prior review exists, full if not.
/full-review Re-reviews the entire PR diff from scratch, ignoring incremental history. Useful after a rebase, when you want a fresh review, or if the bot gets out of sync and reports issues that no longer exist.
/ignore-review-limit Permanently lifts the 2-review automatic limit for this PR. Future pushes will trigger reviews as normal.
/disable-auto-review Stops automatic reviews from triggering on future pushes to this PR. Codeowners can still run /review or /full-review manually.

mvvmm added 2 commits June 27, 2026 18:21
Replace rigid format rules (title must start with [Product] or type:,
description must have a ### Summary section) with lenient semantic checks:

1. Product/area identified — title OR description names the area; no tag required
2. Description explains the work — any human prose passes; only flag truly empty/placeholder descriptions
3. Scope accuracy — only flag material misrepresentation, not missing details

Default to no finding in all three. Only flag clear, significant violations.
@mvvmm

mvvmm commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

/disable-auto-review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants