How to Use Test-Driven Development (TDD) for better AI coding outputs

By Chris Sherlock, Head of Test Capability at Nimble Approach

This blog post explores how combining Test-Driven Development (TDD) with AI coding assistants leads to more accurate, reliable code – while keeping token usage efficient.

AI coding assistants like GitHub, Copilot, Claude, and Cursor can spin up functions, classes, and boilerplate in seconds, making you feel unstoppable – right until they slip in a tiny mistake that derails everything.

With these tools, context is key: the more information you can provide about how the system should work, the better the result. But that context comes at a price – the more input tokens, the more expensive the request.

This is where Test-Driven Development (TDD) becomes invaluable. By combining the discipline of TDD with the speed of AI, you can provide a good amount of context to the assistant, and ensure it only generates exactly what it needs to.

TDD & AI: Why It’s a Perfect Match

TDD follows a simple, powerful loop: Red / Green / Refactor. Here’s how AI supercharges every step.

1. The Red Phase (Write a Test)

Your Job: Write a small test for a single piece of functionality that you’re wanting to build that doesn’t exist yet. The test will fail.

How AI Helps: This is a huge time-saver. You don’t have to write the test boilerplate. You just prompt the AI. This establishes the foundation for the entire workflow.
- You: “Write a pytest test for a function calculate_shipping_cost(weight, distance) that checks for a negative weight.”
- AI: Generates the test file, imports, test class, and the specific pytest.raises(ValueError) assertion.
- Further prompting: You can make further use of the AI assistant for edge cases with a simple prompt such as “What are some edge cases for this function?” and the AI will suggest tests for weight=0, null inputs, and so on, giving you more robust coverage from the start.

2. The Green Phase (Write the Code to Pass the Test)

Your Job: Write the minimum amount of code needed to make the currently failing test pass.

How AI Helps: Because the test now precisely defines the requirement, your failing test provides perfect context for the AI. It’s not guessing your intentions; it’s solving a specific, well-defined problem.
- You: “Write the code to make this test pass.”
- AI: The AI assistant sees the function name, the inputs, and the expected output. It writes the exact implementation code to satisfy that test, rather than a generic, over-engineered solution.

3. The Refactor Phase (Clean Up the Code)

Your Job: Your test is passing, but your code might be messy. Now you clean it up, relying on your test suite to ensure you don’t break anything.

How AI Helps: This phase removes the “fear of change.” With a trusted test suite in place, AI can safely propose structural improvements.
- You: “This function is too long. Can you refactor it into smaller, cleaner methods?”
- AI: Suggests a refactoring.
- You: Run your test suite. If it’s still passing, you accept the change with 100% confidence. If it fails, you know the AI’s suggestion was flawed and you reject it.

In Action: A Simple Python Example

To illustrate this workflow end-to-end, let’s walk through a concrete example. Here, we will build a PasswordValidator function using this TDD/AI workflow.

We have two files:

validator.py (for our function)
test_validator.py (for our tests)

Cycle 1: The First Requirement (Minimum Length)

Requirement: The password must be at least 8 characters long.

Red Phase: Writing the Test

Your Prompt: “Write two pytest tests for a function is_valid_password:

test_fails_if_too_short (asserts False for ‘short’).
test_passes_if_long_enough (asserts True for ‘longenough’).”

AI-Generated test_validator.py:

import pytest
from validator import is_valid_password

def test_fails_if_too_short():
    assert is_valid_password("short") == False

def test_passes_if_long_enough():
    assert is_valid_password("longenough") == True

Action: Run pytest. It fails (as expected) because is_valid_password doesn’t exist. We are Red.

Green Phase: Write the Code

Your Prompt: “The tests are failing. Write the code for the is_valid_password function in validator.py to make them pass.”

AI-Generated validator.py:

def is_valid_password(password):
    if len(password) < 8:
        return False
    return True

Action: Run pytest. It passes! We are Green.

Refactor Phase: Cleaning up the Code

Your Prompt: “Refactor is_valid_password to be more Pythonic.”

AI-Refactored validator.py:

def is_valid_password(password):
    """Checks if a password is valid based on length."""
    return len(password) >= 8

Action: Run pytest. It still passes! The refactor is verified.

With this first cycle complete, we now have a working baseline and a repeatable process for introducing new rules.

Cycle 2: Adding a New Rule (Uppercase)

Requirement: The password must now also contain at least one uppercase letter.

Red Phase: AI Adds a Test

Your Prompt: “A password like ‘longenough’ should now fail. Update the test suite to include a test test_fails_if_no_uppercase. Then rename test_passes_if_long_enough to test_passes_with_valid_password and use ‘LongEnough’ as its input.”

AI-Modified test_validator.py:

import pytest
from validator import is_valid_password

def test_fails_if_too_short():
    assert is_valid_password("Short") == False

def test_fails_if_no_uppercase():
    assert is_valid_password("longenough") == False # New failing test

def test_passes_with_valid_password():
    assert is_valid_password("LongEnough") == True

Action: Run pytest. The new test_fails_if_no_uppercase fails. We are Red.

Green Phase: AI Fixes the Code

Your Prompt: “The test is failing. Update is_valid_password to also require an uppercase letter.”

AI-Generated validator.py:

import re

def is_valid_password(password):
    """Checks if a password is valid based on multiple criteria."""
    
    checks = [
        lambda s: len(s) >= 8,
        lambda s: re.search(r'[A-Z]', s) is not None
    ]
    
    return all(check(password) for check in checks)

(Bonus! The AI was smart and refactored, resulting in a list of checks, knowing we might add more.)

Action: Run pytest. All tests pass. We are Green. The AI assistant handled the new requirement and the refactor at once, and our tests prove it’s correct.

Here, the TDD loop reveals its true strength: adding requirements becomes predictable and low-risk.

Next Stage: TDD, AI, and Mocking External Services

The previous example showed how we can begin to use TDD with AI. To demonstrate how this approach scales to real-world, multi-module systems, let’s look at a more realistic case: You have a call to an external system you don’t control, so you need to introduce mocking.

Goal: Build a LoanService that approves a loan only if an external CreditService reports a score of 700+. We can’t actually call the CreditService in our unit tests, so we need to use mocks.

The External Service (We can’t change this):

# credit_service.py
def get_credit_score(user_id: str) -> int:
    # ... complex, slow network call ...
    print("WARNING: REAL CREDIT SERVICE WAS CALLED!")
    return 800

Cycle 1: The Happy Path (Application Approved)

Red Phase: AI Writes the Test and the Mock

Your Prompt: “I’m in test_loan_service.py. I need to test is_loan_application_approved from loan_service.py. This function uses get_credit_score from credit_service.py.

Write a pytest test called test_application_approved_for_good_credit using the mocker fixture.

Patch loan_service.get_credit_score.
Make the mock return a value of 750.
Assert is_loan_application_approved(‘user-123’) returns True.”

AI-Generated test_loan_service.py:

import pytest
from loan_service import is_loan_application_approved

def test_application_approved_for_good_credit(mocker):
    # Patch the dependency where it's used
    mock_get_score = mocker.patch('loan_service.get_credit_score')
    
    # Set the mock's return value
    mock_get_score.return_value = 750
    
    assert is_loan_application_approved('user-123') == True
    
    # Verify the mock was called as expected
    mock_get_score.assert_called_once_with('user-123')

Action: Run pytest. It fails as the function doesn’t exist. We are Red.

Green Phase: AI Writes Code to Pass the Mock

Your Prompt: “The test is failing. Go to loan_service.py and write the minimal code for is_loan_application_approved to make it pass, using get_credit_score.”

AI-Generated loan_service.py:

from credit_service import get_credit_score

def is_loan_application_approved(user_id: str) -> bool:
    score = get_credit_score(user_id)
    return score >= 700

Action: Run pytest. It passes! And importantly, we did not see our warning message in the console. Our mock worked, and the AI-generated code passed our spec. We are Green.

This example showcases how TDD, combined with AI, provides both correctness and resilience – even when external dependencies fail or behave unexpectedly.

Cycle 2: The “Exception Path” (Service Fails)

This is the real test of robust code. What happens if the credit service is down?

Red Phase: AI Mocks an Exception

Your Prompt: “Add a new test: test_application_rejected_if_service_crashes.

Mock get_credit_score to raise an exception using side_effect=Exception(‘Service Down’).
Assert our function catches this and returns False (it should not crash).”

AI-Generated test_loan_service.py:

# ... (previous test) ...

def test_application_rejected_if_service_crashes(mocker):
    mocker.patch(
        'loan_service.get_credit_score',
        side_effect=Exception('Service Down')
    )
    
    # We expect our code to catch this and return False
    assert is_loan_application_approved('user-789') == False

Action: Run pytest. The test fails with Exception: Service Down. Our function doesn’t have error handling and crashes. We are Red.

Green Phase: AI Writes Robust, Error-Handling Code

Your Prompt: “The new test test_application_rejected_if_service_crashes is failing. Update is_loan_application_approved to catch any Exception from get_credit_score and safely return False.”

AI-Generated loan_service.py:

from credit_service import get_credit_score
import logging

def is_loan_application_approved(user_id: str) -> bool:
    try:
        score = get_credit_score(user_id)
        return score >= 700
    except Exception as e:
        # AI correctly adds the try/except and logging
        logging.warning(f"Credit service failed for {user_id}: {e}")
        return False  # Gracefully return False

Action: Run pytest. All tests pass!

The Takeaway: Context is Everything

By using a TDD approach with your AI coding assistants, you’re doing more than just generating code: you’re working in a highly resilient way based on solid engineering principles. TDD may have been around for a while, but it excels in an AI-driven workflow: shifting the focus from fixing unpredictable outputs to defining the right behaviour upfront, giving AI the context it needs to deliver faster and more accurately.

At Nimble Approach, we help teams adopt and embed practices like TDD into their everyday development workflows, enabling them to move quickly with confidence. Find out how we can transform your testing approach and accelerate delivery here.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
rc::a	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::b	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::c	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::f	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_GRECAPTCHA	6 months	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.

Cookie	Duration	Description
yt-player-bandwidth	never	The yt-player-bandwidth cookie is used to store the user's video player preferences and settings, particularly related to bandwidth and streaming quality on YouTube.
yt-player-headers-readable	never	The yt-player-headers-readable cookie is used by YouTube to store user preferences related to video playback and interface, enhancing the user's viewing experience.
yt-remote-cast-available	session	The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.
yt-remote-cast-installed	session	The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-fast-check-period	session	The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
yt-remote-session-app	session	The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
yt-remote-session-name	session	The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
ytidb::LAST_RESULT_ENTRY_KEY	never	The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Cookie	Duration	Description
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.

Cookie	Duration	Description
VISITOR_INFO1_LIVE	6 months	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
VISITOR_PRIVACY_METADATA	6 months	YouTube sets this cookie to store the user's cookie consent state for the current domain.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

How to Use Test-Driven Development (TDD) for better AI coding outputs

TDD & AI: Why It’s a Perfect Match

1. The Red Phase (Write a Test)

2. The Green Phase (Write the Code to Pass the Test)

3. The Refactor Phase (Clean Up the Code)

In Action: A Simple Python Example

Cycle 1: The First Requirement (Minimum Length)

Red Phase: Writing the Test

Green Phase: Write the Code

Refactor Phase: Cleaning up the Code

Cycle 2: Adding a New Rule (Uppercase)

Red Phase: AI Adds a Test

Green Phase: AI Fixes the Code

Next Stage: TDD, AI, and Mocking External Services

Cycle 1: The Happy Path (Application Approved)

Red Phase: AI Writes the Test and the Mock

Green Phase: AI Writes Code to Pass the Mock

Cycle 2: The “Exception Path” (Service Fails)

Red Phase: AI Mocks an Exception

Green Phase: AI Writes Robust, Error-Handling Code

The Takeaway: Context is Everything

Get In Touch

Recent Posts

Blog Categories

Contact Us

What We Do

Previous PostWhy You Need to be Doing Security Testing For Your AI (And How to Do It)

Next PostWhy Your RAG System Fails - and How Semantic Chunking Fixes It

Menu

Contact

How to Use Test-Driven Development (TDD) for better AI coding outputs

TDD & AI: Why It’s a Perfect Match

1. The Red Phase (Write a Test)

2. The Green Phase (Write the Code to Pass the Test)

3. The Refactor Phase (Clean Up the Code)

In Action: A Simple Python Example

Cycle 1: The First Requirement (Minimum Length)

Red Phase: Writing the Test

Green Phase: Write the Code

Refactor Phase: Cleaning up the Code

Cycle 2: Adding a New Rule (Uppercase)

Red Phase: AI Adds a Test

Green Phase: AI Fixes the Code

Next Stage: TDD, AI, and Mocking External Services

Cycle 1: The Happy Path (Application Approved)

Red Phase: AI Writes the Test and the Mock

Green Phase: AI Writes Code to Pass the Mock

Cycle 2: The “Exception Path” (Service Fails)

Red Phase: AI Mocks an Exception

Green Phase: AI Writes Robust, Error-Handling Code

The Takeaway: Context is Everything

Get In Touch

Recent Posts

Blog Categories

Contact Us

What We Do

Previous PostWhy You Need to be Doing Security Testing For Your AI (And How to Do It)

Next PostWhy Your RAG System Fails - and How Semantic Chunking Fixes It

You May Also Like

Why Token Awareness Matters: A Developer’s Hard-Learned Lesson in AI Efficiency

Amplifier, Not Autopilot: What the DORA AI Findings Really Tell Us

Enhancing AI With Organisational Context: Why Less Is Often More

Menu

Contact