Agent-J+ | Don't trust me, trust your AI

The problem

The 75% claim is everywhere. Nobody checks.

Different repo every week, same headline number. "Caveman prompt saves 75% of tokens."

The benchmarks behind these claims use a single-line system prompt and ten isolated one-shot prompts. No tool use, no file reads, no conversation history. That's not how anyone actually uses Claude Code.

This report runs a real audit. 20 sessions, 5,499 assistant messages, every token categorized by source. It shows where your tokens actually go.

What's inside

What the audit covers.

THE BENCHMARK PROBLEM

Why the published numbers don't apply.

What the benchmarks measure vs. what actually happens in real Claude Code sessions. Different inputs, different math.

REAL TOKEN BREAKDOWN

56% tool results. 26% function calls. 9% chat.

The caveman rule can only touch that last 9%. Everything else goes straight to the model, untouched.

REALISTIC SAVINGS

3.7% actual. 7% ceiling.

The real session-wide saving, broken down by session type. The math, not the marketing.

WHAT ACTUALLY WORKS

Three lines for CLAUDE.md.

What to paste into your project config that handles the real problem without breaking Claude's ability to think.

Grab the report.

The full audit: token breakdown, real benchmarks, and the three CLAUDE.md lines that actually work. One PDF, no gate.

Download the PDF →

Want to run your own audits? Join Agent-J+. We do this every week.

Don't trust me, trust your AI.