Don't Let the LLM Verify. Make It Build the Verifier.
Someone posted about asking Claude to generate an HTML report from a JSON file, then spawning 4 agents to test it. All 4 reported success. Manual testing showed 60%+ failure — hallucinated selectors, fake IDs, wrong values.
When you ask an LLM to “check if this is correct,” it predicts what a correct-sounding check looks like. That’s not the same as actually checking.
What I do instead
I tell the agent to write a script that performs the check, then run the script. Not through the LLM — just execute it.
The obvious examples: lint and formatting. I don’t ask Claude “is this formatted correctly?” I have it run eslint and prettier. The tool tells me if it passes or not.
This works for ad-hoc checks too. Say I need to verify an HTML report pulls the right values from a JSON source. I tell Claude to write a script that parses the JSON, queries the DOM with a real parser, compares expected vs. actual, and prints mismatches. Then run it. Same input, same output every time.
The pattern applies anywhere there’s a ground truth to check against — data validation, math, DOM structure, spelling, broken links. All of these have real tools or can be checked with a short script. The LLM writes the script. The script does the verification.
It’s also cheaper. The script runs without burning tokens, and once you have it, you can improve it and rerun it as many times as you want. Having the LLM re-check means paying for a fresh prediction every time — and getting a different answer each time too.
Why more agents don’t help
Four agents running the same prediction process give you four predictions. If the model hallucinates a selector, it hallucinates it four times. More agents just means synchronized fiction.
Agents help when each one runs a real tool and reports the output. The agent orchestrates. The tool verifies.