Skip to main content

By Gareth Hallberg, Lead Consultant at Nimble Approach

In this post, I break down how an ordinary proof-of-concept spiralled into a 66-million-token incident – and what it taught me about the real costs of modern AI workflows. If your team uses LLMs in development, this story highlights practical steps to avoid runaway compute, expense, and carbon impact.

Last week, I made a mistake that would haunt any environmentally conscious developer. While working on a proof-of-concept for some pre-sales activity, I asked Gemini CLI for what I thought would be a quick code review. When I finally quit the session, I had used over 66 million tokens.

To put that in perspective: I had just consumed enough energy to boil 2,000 kettles. I had generated approximately 548 kilograms of CO₂ – equivalent to driving 1,400 miles or what an average person produces in 1.5 months of normal activities.

All for a “quick” code review.

The Setup: A Simple PoC Gone Wrong

The application I was building seemed straightforward enough:

  • User asks a question through a chat interface
  • AI acts as a “front door” to decide which product they want using RAG
  • RAG returns a product_id
  • We have a guided conversation using NL2SQL

Simple, right? But when I asked Gemini CLI to review my code and requested it use Context7 to get all of the documentation to verify we were using the latest versions of different libraries, I unknowingly set off a chain reaction that would consume massive computational resources.

The Anatomy of a Token Explosion

Here’s exactly how my “quick review” spiralled out of control:

Step 1: Context Loading

  • Context7 pulled all documentation: 143,071 input tokens
  • This created our massive baseline cost

Step 2: Code Review Processing

  • AI processing and response generation: +717,871 tokens
  • We’re now at 860,942 tokens total

Step 3: The Hidden Costs

  • Thoughts tokens (internal reasoning): +656,000 tokens
  • Cached tokens: +360,000 tokens
  • Total so far: ~1.87 million tokens

Step 4: The Conversation Tax

I typed “wait here” – a simple two-word message.

Cost: 400,000 tokens.

Why? The entire conversation history gets re-sent with every message.

This pattern continued. Every small interaction carried the weight of our entire conversation history – compressed, but still costly.

The Three Token Killers

Through this painful experience, I identified three primary ways AI conversations consume resources exponentially:

1. Context Loading

When you ask an AI to “get all the documentation,” you’re creating a massive baseline cost. My 143k token starting point was like revving a Ferrari engine before even putting it in drive.

2. Processing Amplification

AI doesn’t just read your request – it generates extensive responses, internal reasoning, and contextual understanding. My simple code review request generated over 700k additional tokens in processing.

3. The Conversation Memory Tax

This is the hidden killer. Every message in an AI conversation carries the full history. Even typing “wait here” cost me 400,000 tokens because the AI had to process our entire conversation again to understand the context.

Common Culprits in AI Development

Based on this experience and discussions with my team, here are the most dangerous patterns we’ve identified:

  • RAG Gone Wild: Retrieval systems that pull too much context. It’s tempting to give the AI “everything it might need,” but this creates exponential costs.

  • Iterative Code Reviews: Each round of feedback compounds the token usage.

  • Documentation Scraping: Tools like Context7 that can pull entire documentation sets – useful, but potentially devastating if not used carefully.

  • Agent Chains: When AI agents talk to each other, token usage can spiral quickly.

  • Long-Running Conversations: The longer the conversation, the more expensive each subsequent message becomes due to the conversation memory tax.

Prevention: Building Sustainable AI Practices

Here’s what I’ve learned about using AI responsibly:

Token Budgeting

Set limits before you start. Many AI services offer usage tracking and alerts. Use them.

Context Hygiene

  • Be selective about what context you load
  • Use summarisation techniques for long conversations
  • Regularly “reset” sessions when the history becomes unwieldy

Conversation Architecture

Know when to start fresh versus continuing a conversation. Sometimes it’s more efficient to begin a new session than to carry forward massive context.

Monitoring and Alerts

Implement real-time usage tracking. Don’t wait until the end of the month to discover your token bomb.

Team Guidelines

Establish shared practices for responsible AI use:

  • Context loading limits
  • Conversation length guidelines
  • Regular usage reviews
  • Clear escalation procedures when usage spikes

The Bigger Picture

My 66 million token mistake represents more than just a billing surprise or an environmental impact. It highlights a critical challenge as AI becomes ubiquitous in development workflows: the hidden costs of convenience.

Every time we ask an AI to “analyse everything” or “consider all possibilities,” we’re making a trade-off between thoroughness and resource consumption. In a world where AI inference is powered largely by data centres running on electricity from various sources (many still fossil-fuel-dependent), these decisions have real environmental consequences.

The 2,000-Kettle Reality Check

When I calculated that my token usage was equivalent to boiling 2,000 kettles, it fundamentally changed how I think about AI interactions. Each query isn’t free – it has a carbon footprint, an energy cost, and a financial impact.

This doesn’t mean we should stop using AI tools. They’re incredibly powerful and can make us more productive and creative. But it does mean we need to use them thoughtfully.

Before you hit enter on that next AI request, ask yourself:

  • Do I really need all the context, or just the relevant parts?

  • Could I break this into smaller, more focused queries?

  • Am I continuing a conversation that’s become unwieldy?

  • Have I set appropriate usage limits?

Moving Forward

The future of sustainable AI development isn’t about using less AI – it’s about using it more intelligently. As these tools become more integrated into our workflows, we need to develop the same kind of resource consciousness we’ve learned to apply to other aspects of software development.

Think of token usage like memory allocation in programming: powerful when used correctly, dangerous when ignored, and always worth monitoring.

My 66 million token mistake taught me that in the age of AI, every developer needs to become an energy-conscious developer. The planet depends on it.

If you’re looking to build AI systems that are powerful and efficient, our team can help. Contact us to discuss how we can support your sustainable AI strategy.

Get In Touch