Claude Code vs Cursor vs Codex: Which AI Agent Should You Use?
I use Claude Code, Droid, and Codex daily across all my projects. I also ship a SaaS starter kit — Stacknaut — that comes with an AGENTS.md pre-configured for coding agents.
These are practical daily-use observations from working with all three agents on production codebases with skills, project instructions, and defined conventions.
Claude Code
Claude Code is the most capable agent I use for working with a structured codebase. It reads CLAUDE.md (which I point to AGENTS.md via @AGENTS.md) at session start and follows instructions consistently.
What it does well:
- Follows AGENTS.md conventions reliably — coding style, commit format, tool preferences
- Reads and understands project structure quickly. Greps for patterns, reads relevant files, builds a mental map
- Handles multi-step tasks well — “add a new API endpoint with tests, types, and a frontend page” works in a single prompt
- Git integration is solid — commits, branches, diffs
- Skills work naturally. Trigger a skill, the prompt gets injected, the agent follows it
Where it struggles:
- Context window fills up fast on large codebases. Long sessions degrade. I start fresh sessions often
- Sometimes over-reads files — pulls in more context than needed, burning through the window
- Can be cautious about running commands, asking for approval when I’d prefer it just goes ahead. I use
--dangerously-skip-permissionsfor trusted projects. I used to route Claude Max through Droid via CLIProxyAPI partly because Droid has a stronger permission system — but Anthropic has since restricted Claude Max use with third-party tools
With a structured codebase: Claude Code handles Stacknaut’s monorepo structure (frontend/backend/shared) naturally. It understands the path aliases, knows to run type-check in both packages, and follows the Drizzle ORM patterns without me repeating the rules. The pre-configured AGENTS.md means the first session on a new project based on Stacknaut is already productive — no warmup needed.
Codex
Codex is OpenAI’s open source agent. I use it for reviewing code that Claude Code wrote, bug fixing, and tackling tasks where I want a different perspective.
What it does well:
- Fast for targeted tasks. “Review this file for issues” or “refactor this function” — Codex is snappy
- Good at catching things other agents missed. Different model, different blind spots
- Reads AGENTS.md and follows basic conventions
- The sandbox is a nice safety net — commands run in an isolated environment by default
- Open source, so I can see exactly what it’s doing
Where it struggles:
- Less capable at multi-step agentic workflows than Claude Code. It handles simpler task chains better than complex ones
- Skills work but less polished than Droid and Claude Code — I share skills across all three, though Codex sometimes needs more nudging to follow them
- The sandbox, while safe, sometimes prevents it from doing things I want — accessing the network, running the dev server, interacting with Docker
With a structured codebase: Codex works well for targeted edits within Stacknaut — fixing a bug, adding a field, updating a component. For bigger tasks like “add a new billing plan with Stripe integration across frontend, backend, and shared types,” I reach for Claude Code. Codex tends to need more prompting to coordinate across a monorepo.
Cursor
Cursor is the best IDE-embedded agent, but I keep coming back to terminal agents.
What it does well:
- Tab completion is genuinely good for small, predictive edits while you’re actively writing code
- Inline diffs are nice to review — you see the changes in context without switching tools
- Reads project rules (
.cursor/rules,AGENTS.md) and follows conventions - The Composer/Agent mode handles multi-file edits within the IDE
- Background agents and Bugbot for automated tasks
Where it struggles:
- Primarily an editor experience. Cursor has a CLI and background agents now, but the core workflow is still VS Code. I use WebStorm and Neovim — Cursor means giving those up
- Parallel sessions are less natural. Background agents help, but with terminal agents I run 3-5 in tmux and coordinate between them. That’s harder to replicate in an editor
- Project rules work for conventions, but there’s no skills system like Claude Code or Droid have — small, portable prompts I can trigger on demand and share across agents
With a structured codebase: Cursor handles a starter kit fine for single-file edits. Where it falls short is the agentic workflow I actually use — having an agent autonomously implement a feature across the monorepo, run tests, check types, fix errors, and commit. That workflow needs a terminal agent that can loop independently.
Droid
Droid was my primary agent. It reads both AGENTS.md and CLAUDE.md, supports skills and custom droids, and has good context management.
What it does well:
- Model-agnostic — I can use different models for different tasks
- Skills and custom droids work well for repeatable workflows. I have droids for review, exploration, and specific project tasks
- Spec mode lets me plan before coding — useful for complex features where I want to review the approach before the agent starts writing
- Sub-agents via the Task tool — delegate subtasks to separate instances
- Good at following AGENTS.md conventions, especially with project-specific custom droids
Where it struggles:
- Newer than Claude Code, so the ecosystem and documentation are still growing
- Some rough edges in session management compared to Claude Code
With a structured codebase: Droid works particularly well with Stacknaut because I can create project-specific droids that know the codebase deeply. A custom droid configured for “add a new API endpoint” knows the exact file patterns, the route structure, the type definitions, and the test setup. It goes beyond what AGENTS.md alone provides.
How I Actually Use Them Together
I don’t pick one agent. I use all three:
- Claude Code for primary development — implementing features, working through complex tasks, using skills for commit/review/deploy workflows
- Droid for an alternative perspective and when I want spec mode or custom droids for specific workflows
- Codex for review — I have it check what the other agents wrote. Different model catches different issues
The shared AGENTS.md means all three agents follow the same conventions. The code they produce is consistent regardless of which agent wrote it. That’s the whole point of having project instructions — it normalizes the output across agents.
Which Should You Pick?
If you’re working with a structured codebase — starter kit or not — start with Claude Code. It’s the most capable, most polished, and the AGENTS.md support is mature.
Add Codex as a reviewer. Having a second agent review the first agent’s work is one of the most reliable quality improvements I’ve found.
If you want skills and custom agents, try Droid. Project-specific droids that know your exact patterns go beyond what AGENTS.md alone provides.
Cursor is fine if you prefer staying in VS Code. It’ll follow your project conventions. But you’ll miss the composability and parallel sessions of terminal agents.