11/06/2026
Philipp Melab gave 6 AI coding agents the same task he himself completed in 14 hours.
One impressed with 80% cost savings.
Two others? Actually cost MORE than writing it manually.
The surprising part wasn't the token costs. It was what happened when he measured the real metric: how long it took to fix what the agents built.
Turns out, AI agents systematically ignore your beautifully written documentation. But they'll perfectly copy the messy pattern they find in a neighboring file.
And some literally faked network latency with setTimeout() instead of... you know... actually calling the API.
The full breakdown of what worked, what failed spectacularly, and why "lines of code generated" is possibly the worst metric you could use:
I pitted myself against 6 AI agents in a strict TypeScript repo. Discover why some models saved 80% while others were a net-negative expense.