Why Token Awareness Matters: A Developer’s Hard-Learned Lesson in AI Efficiency

By Gareth Hallberg, Lead Consultant at Nimble Approach

In this post, I break down how an ordinary proof-of-concept spiralled into a 66-million-token incident – and what it taught me about the real costs of modern AI workflows. If your team uses LLMs in development, this story highlights practical steps to avoid runaway compute, expense, and carbon impact.

Last week, I made a mistake that would haunt any environmentally conscious developer. While working on a proof-of-concept for some pre-sales activity, I asked Gemini CLI for what I thought would be a quick code review. When I finally quit the session, I had used over 66 million tokens.

To put that in perspective: I had just consumed enough energy to boil 2,000 kettles. I had generated approximately 548 kilograms of CO₂ – equivalent to driving 1,400 miles or what an average person produces in 1.5 months of normal activities.

All for a “quick” code review.

The Setup: A Simple PoC Gone Wrong

The application I was building seemed straightforward enough:

User asks a question through a chat interface
AI acts as a “front door” to decide which product they want using RAG
RAG returns a product_id
We have a guided conversation using NL2SQL

Simple, right? But when I asked Gemini CLI to review my code and requested it use Context7 to get all of the documentation to verify we were using the latest versions of different libraries, I unknowingly set off a chain reaction that would consume massive computational resources.

The Anatomy of a Token Explosion

Here’s exactly how my “quick review” spiralled out of control:

Step 1: Context Loading

Context7 pulled all documentation: 143,071 input tokens
This created our massive baseline cost

Step 2: Code Review Processing

AI processing and response generation: +717,871 tokens
We’re now at 860,942 tokens total

Step 3: The Hidden Costs

Thoughts tokens (internal reasoning): +656,000 tokens
Cached tokens: +360,000 tokens
Total so far: ~1.87 million tokens

Step 4: The Conversation Tax

I typed “wait here” – a simple two-word message.

Cost: 400,000 tokens.

Why? The entire conversation history gets re-sent with every message.

This pattern continued. Every small interaction carried the weight of our entire conversation history – compressed, but still costly.

The Three Token Killers

Through this painful experience, I identified three primary ways AI conversations consume resources exponentially:

1. Context Loading

When you ask an AI to “get all the documentation,” you’re creating a massive baseline cost. My 143k token starting point was like revving a Ferrari engine before even putting it in drive.

2. Processing Amplification

AI doesn’t just read your request – it generates extensive responses, internal reasoning, and contextual understanding. My simple code review request generated over 700k additional tokens in processing.

3. The Conversation Memory Tax

This is the hidden killer. Every message in an AI conversation carries the full history. Even typing “wait here” cost me 400,000 tokens because the AI had to process our entire conversation again to understand the context.

Common Culprits in AI Development

Based on this experience and discussions with my team, here are the most dangerous patterns we’ve identified:

RAG Gone Wild: Retrieval systems that pull too much context. It’s tempting to give the AI “everything it might need,” but this creates exponential costs.
Iterative Code Reviews: Each round of feedback compounds the token usage.
Documentation Scraping: Tools like Context7 that can pull entire documentation sets – useful, but potentially devastating if not used carefully.
Agent Chains: When AI agents talk to each other, token usage can spiral quickly.
Long-Running Conversations: The longer the conversation, the more expensive each subsequent message becomes due to the conversation memory tax.

Prevention: Building Sustainable AI Practices

Here’s what I’ve learned about using AI responsibly:

Token Budgeting

Set limits before you start. Many AI services offer usage tracking and alerts. Use them.

Context Hygiene

Be selective about what context you load
Use summarisation techniques for long conversations
Regularly “reset” sessions when the history becomes unwieldy

Conversation Architecture

Know when to start fresh versus continuing a conversation. Sometimes it’s more efficient to begin a new session than to carry forward massive context.

Monitoring and Alerts

Implement real-time usage tracking. Don’t wait until the end of the month to discover your token bomb.

Team Guidelines

Establish shared practices for responsible AI use:

Context loading limits
Conversation length guidelines
Regular usage reviews
Clear escalation procedures when usage spikes

The Bigger Picture

My 66 million token mistake represents more than just a billing surprise or an environmental impact. It highlights a critical challenge as AI becomes ubiquitous in development workflows: the hidden costs of convenience.

Every time we ask an AI to “analyse everything” or “consider all possibilities,” we’re making a trade-off between thoroughness and resource consumption. In a world where AI inference is powered largely by data centres running on electricity from various sources (many still fossil-fuel-dependent), these decisions have real environmental consequences.

The 2,000-Kettle Reality Check

When I calculated that my token usage was equivalent to boiling 2,000 kettles, it fundamentally changed how I think about AI interactions. Each query isn’t free – it has a carbon footprint, an energy cost, and a financial impact.

This doesn’t mean we should stop using AI tools. They’re incredibly powerful and can make us more productive and creative. But it does mean we need to use them thoughtfully.

Before you hit enter on that next AI request, ask yourself:

Do I really need all the context, or just the relevant parts?
Could I break this into smaller, more focused queries?
Am I continuing a conversation that’s become unwieldy?
Have I set appropriate usage limits?

Moving Forward

The future of sustainable AI development isn’t about using less AI – it’s about using it more intelligently. As these tools become more integrated into our workflows, we need to develop the same kind of resource consciousness we’ve learned to apply to other aspects of software development.

Think of token usage like memory allocation in programming: powerful when used correctly, dangerous when ignored, and always worth monitoring.

My 66 million token mistake taught me that in the age of AI, every developer needs to become an energy-conscious developer. The planet depends on it.

If you’re looking to build AI systems that are powerful and efficient, our team can help. Contact us to discuss how we can support your sustainable AI strategy.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
rc::a	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::b	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::c	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::f	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_GRECAPTCHA	6 months	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.

Cookie	Duration	Description
yt-player-bandwidth	never	The yt-player-bandwidth cookie is used to store the user's video player preferences and settings, particularly related to bandwidth and streaming quality on YouTube.
yt-player-headers-readable	never	The yt-player-headers-readable cookie is used by YouTube to store user preferences related to video playback and interface, enhancing the user's viewing experience.
yt-remote-cast-available	session	The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.
yt-remote-cast-installed	session	The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-fast-check-period	session	The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
yt-remote-session-app	session	The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
yt-remote-session-name	session	The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
ytidb::LAST_RESULT_ENTRY_KEY	never	The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Cookie	Duration	Description
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.

Cookie	Duration	Description
VISITOR_INFO1_LIVE	6 months	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
VISITOR_PRIVACY_METADATA	6 months	YouTube sets this cookie to store the user's cookie consent state for the current domain.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

Why Token Awareness Matters: A Developer’s Hard-Learned Lesson in AI Efficiency

The Setup: A Simple PoC Gone Wrong