Earlier this month, the Financial Times reported that Amazon employees had coined a term for something that I had already started to suspect: "tokenmaxxing".
Amazon set targets requiring 80%+ of developers to use AI tools weekly, visible on leaderboards ranked by token consumption. Predictably, employees responded to the incentive in front of them: running trivial zero-value tasks, padding prompts, leaving context windows open — all in efforts to climb the board.
Meta's equivalent ("Claudeonomics") saw 60 trillion tokens consumed in a single month. Fortune quoted D.A. Davidson analyst Gil Luria: "That doesn't sound very healthy."
It isn't. And it's a direct application of Goodhart's Law: "when a measure becomes a target, it ceases to be a good measure."
The Uncomfortable Truth About Token-Volume Incentives
- A poorly designed prompt — vague, verbose, and/or iterative — consumes far more tokens than a well-crafted one. Under a volume reward system, bad prompting wins.
- Lazy agentic use (triggering multi-step AI chains for single-step tasks) inflates token counts dramatically. It's rewarded. It shouldn't be.
- This is the "lines-of-code" antipattern in a new guise. We stopped measuring developer output by lines committed decades ago because we learned it rewarded verbosity, not value.
The Surprising Data on AI Productivity
METR's peer-reviewed 2025 RCT found that experienced developers using AI tools took 19% longer on tasks than those working without them, while self-reporting they were 20% faster.
McKinsey & Company's 2026 Global AI Survey found that 86% of enterprises have increased AI budgets, but only 29% can reliably measure the return. Meanwhile, Larridin, Inc. found some organisations posting 50% more defects post-AI adoption.
What Organisations Should Actually Measure
- Output quality: Defect rates, change failure rates, code maintainability
- Cycle time per successful outcome: Not tokens consumed
- Prompt efficiency: Tokens per correct answer (a ratio, not a volume)
- Business outcomes: Cost reduction, P&L-visible improvements, time saved
McKinsey's five-layer AI measurement framework is explicit: top-performing organisations measure proficiency and outcomes, not adoption volume. The organisations seeing real AI ROI are those who defined success before deploying — not those who chased a leaderboard.
The Bottom Line
The instinct to gamify AI adoption is understandable. Tokens are countable, dashboards are visual, and leadership wants evidence of progress. But there's a difference between measuring activity and measuring value.
We wouldn't reward a surgeon for the number of incisions. We shouldn't reward an employee for the number of tokens.
Are you seeing this pattern in your organisation? What metrics are you finding most useful to capture genuine AI value? I'd be interested to hear your experiences — connect with me on LinkedIn or get in touch.

