Using a Second LLM to Review Your Coding Agent’s Work

Different LLMs think differently. When one gets stuck, it tends to bang its head against the wall — trying the same approach over and over. A second model often sees the problem from a different angle and breaks through.

I use Droid (which runs Claude) as my primary coding agent and Codex as a reviewer. The idea is simple: after Droid makes changes, I have Codex review the dirty diff before I commit. I set this up using skills — reusable prompt snippets that all 3 of my agents understand.

The “Review Dirty” Skill

This skill takes all uncommitted changes and sends them to Codex for review:

---
name: review-dirty
description: Review dirty code changes. When user say to "review"
  or "review changes" or "review dirty code"
---

All dirty repo changes are likely made in this session,
though not always

if you are Codex, just review the dirty code and ignore the
rest in this skill. If you are not Codex, continue:

Do not modify anything unless I tell you to. Run this cli
command (using codex as our reviewer) passing in the original
prompt to review the changes: `codex exec "Do not modify
anything unless I tell you to. Review the dirty repo changes
which are to implement: <prompt>"`. $ARGUMENTS. Do it with
Bash tool. Make sure if there's a timeout to be at least 10
minutes.

The if you are Codex guard is because when Droid calls codex exec, Codex picks up the same skill files. Without it, Codex would try to call itself recursively.

I just say “review” or “review dirty” and Droid shells out to Codex, which reads the git diff and gives its assessment.

Taking It Further: Review and Fix in a Loop

Once you have a review skill, the next step is obvious — automate the fix-review cycle:

---
name: review-plus-fix-relentlessly
description: Review dirty code and fix iteratively. When user
  say to "loop to fix dirty" or "review+fix"
---

All dirty repo changes are likely made in this session,
though not always

Use the review dirty skill to review changes and fix to your
best ability and matching repo preferences and style.

After fixing, run review-plus-fix-relentlessly again, and
before each cycle report how many cycles of review+fix we
have done.

Stop if code review skill doesn't not produce any more things
to fix

This creates a loop: Droid makes changes, Codex reviews, Droid fixes what Codex flagged, Codex reviews again. It keeps going until Codex has nothing left to flag.

Why This Works

The value isn’t just catching bugs — it’s that the two models have different blind spots. Claude might over-engineer a solution while Codex points out a simpler approach. Or Droid might miss an edge case that Codex catches because it’s looking at the code fresh.

The review loop usually converges in 2-3 cycles.

Setup

I wrote about skills and my shared agent setup previously. Drop the skill files in your skills directory and they’re available to all your agents.

For this to work, you need both Droid (or Claude Code) and Codex installed, with Codex accessible via codex exec from the command line.