By Chris Sherlock, Head of Test Capability at Nimble Approach
This blog post explores how combining Test-Driven Development (TDD) with AI coding assistants leads to more accurate, reliable code – while keeping token usage efficient.
AI coding assistants like GitHub, Copilot, Claude, and Cursor can spin up functions, classes, and boilerplate in seconds, making you feel unstoppable – right until they slip in a tiny mistake that derails everything.
With these tools, context is key: the more information you can provide about how the system should work, the better the result. But that context comes at a price – the more input tokens, the more expensive the request.
This is where Test-Driven Development (TDD) becomes invaluable. By combining the discipline of TDD with the speed of AI, you can provide a good amount of context to the assistant, and ensure it only generates exactly what it needs to.
TDD & AI: Why It’s a Perfect Match
TDD follows a simple, powerful loop: Red / Green / Refactor. Here’s how AI supercharges every step.
1. The Red Phase (Write a Test)
Your Job: Write a small test for a single piece of functionality that you’re wanting to build that doesn’t exist yet. The test will fail.
- How AI Helps: This is a huge time-saver. You don’t have to write the test boilerplate. You just prompt the AI. This establishes the foundation for the entire workflow.
- You: “Write a pytest test for a function calculate_shipping_cost(weight, distance) that checks for a negative weight.”
- AI: Generates the test file, imports, test class, and the specific pytest.raises(ValueError) assertion.
- Further prompting: You can make further use of the AI assistant for edge cases with a simple prompt such as “What are some edge cases for this function?” and the AI will suggest tests for weight=0, null inputs, and so on, giving you more robust coverage from the start.
2. The Green Phase (Write the Code to Pass the Test)
Your Job: Write the minimum amount of code needed to make the currently failing test pass.
- How AI Helps: Because the test now precisely defines the requirement, your failing test provides perfect context for the AI. It’s not guessing your intentions; it’s solving a specific, well-defined problem.
- You: “Write the code to make this test pass.”
- AI: The AI assistant sees the function name, the inputs, and the expected output. It writes the exact implementation code to satisfy that test, rather than a generic, over-engineered solution.
3. The Refactor Phase (Clean Up the Code)
Your Job: Your test is passing, but your code might be messy. Now you clean it up, relying on your test suite to ensure you don’t break anything.
- How AI Helps: This phase removes the “fear of change.” With a trusted test suite in place, AI can safely propose structural improvements.
- You: “This function is too long. Can you refactor it into smaller, cleaner methods?”
- AI: Suggests a refactoring.
- You: Run your test suite. If it’s still passing, you accept the change with 100% confidence. If it fails, you know the AI’s suggestion was flawed and you reject it.
In Action: A Simple Python Example
To illustrate this workflow end-to-end, let’s walk through a concrete example. Here, we will build a PasswordValidator function using this TDD/AI workflow.
We have two files:
- validator.py (for our function)
- test_validator.py (for our tests)
Cycle 1: The First Requirement (Minimum Length)
Requirement: The password must be at least 8 characters long.
Red Phase: Writing the Test
Your Prompt: “Write two pytest tests for a function is_valid_password:
- test_fails_if_too_short (asserts False for ‘short’).
- test_passes_if_long_enough (asserts True for ‘longenough’).”
AI-Generated test_validator.py:
import pytest
from validator import is_valid_password
def test_fails_if_too_short():
assert is_valid_password("short") == False
def test_passes_if_long_enough():
assert is_valid_password("longenough") == True
Action: Run pytest. It fails (as expected) because is_valid_password doesn’t exist. We are Red.
Green Phase: Write the Code
Your Prompt: “The tests are failing. Write the code for the is_valid_password function in validator.py to make them pass.”
AI-Generated validator.py:
def is_valid_password(password):
if len(password) < 8:
return False
return True
Action: Run pytest. It passes! We are Green.
Refactor Phase: Cleaning up the Code
Your Prompt: “Refactor is_valid_password to be more Pythonic.”
AI-Refactored validator.py:
def is_valid_password(password):
"""Checks if a password is valid based on length."""
return len(password) >= 8
Action: Run pytest. It still passes! The refactor is verified.
With this first cycle complete, we now have a working baseline and a repeatable process for introducing new rules.
Cycle 2: Adding a New Rule (Uppercase)
Requirement: The password must now also contain at least one uppercase letter.
Red Phase: AI Adds a Test
Your Prompt: “A password like ‘longenough’ should now fail. Update the test suite to include a test test_fails_if_no_uppercase. Then rename test_passes_if_long_enough to test_passes_with_valid_password and use ‘LongEnough’ as its input.”
AI-Modified test_validator.py:
import pytest
from validator import is_valid_password
def test_fails_if_too_short():
assert is_valid_password("Short") == False
def test_fails_if_no_uppercase():
assert is_valid_password("longenough") == False # New failing test
def test_passes_with_valid_password():
assert is_valid_password("LongEnough") == True
Action: Run pytest. The new test_fails_if_no_uppercase fails. We are Red.
Green Phase: AI Fixes the Code
Your Prompt: “The test is failing. Update is_valid_password to also require an uppercase letter.”
AI-Generated validator.py:
import re
def is_valid_password(password):
"""Checks if a password is valid based on multiple criteria."""
checks = [
lambda s: len(s) >= 8,
lambda s: re.search(r'[A-Z]', s) is not None
]
return all(check(password) for check in checks)
(Bonus! The AI was smart and refactored, resulting in a list of checks, knowing we might add more.)
Action: Run pytest. All tests pass. We are Green. The AI assistant handled the new requirement and the refactor at once, and our tests prove it’s correct.
Here, the TDD loop reveals its true strength: adding requirements becomes predictable and low-risk.
Next Stage: TDD, AI, and Mocking External Services
The previous example showed how we can begin to use TDD with AI. To demonstrate how this approach scales to real-world, multi-module systems, let’s look at a more realistic case: You have a call to an external system you don’t control, so you need to introduce mocking.
Goal: Build a LoanService that approves a loan only if an external CreditService reports a score of 700+. We can’t actually call the CreditService in our unit tests, so we need to use mocks.
The External Service (We can’t change this):
# credit_service.py
def get_credit_score(user_id: str) -> int:
# ... complex, slow network call ...
print("WARNING: REAL CREDIT SERVICE WAS CALLED!")
return 800
Cycle 1: The Happy Path (Application Approved)
Red Phase: AI Writes the Test and the Mock
Your Prompt: “I’m in test_loan_service.py. I need to test is_loan_application_approved from loan_service.py. This function uses get_credit_score from credit_service.py.
Write a pytest test called test_application_approved_for_good_credit using the mocker fixture.
- Patch loan_service.get_credit_score.
- Make the mock return a value of 750.
- Assert is_loan_application_approved(‘user-123’) returns True.”
AI-Generated test_loan_service.py:
import pytest
from loan_service import is_loan_application_approved
def test_application_approved_for_good_credit(mocker):
# Patch the dependency where it's used
mock_get_score = mocker.patch('loan_service.get_credit_score')
# Set the mock's return value
mock_get_score.return_value = 750
assert is_loan_application_approved('user-123') == True
# Verify the mock was called as expected
mock_get_score.assert_called_once_with('user-123')
Action: Run pytest. It fails as the function doesn’t exist. We are Red.
Green Phase: AI Writes Code to Pass the Mock
Your Prompt: “The test is failing. Go to loan_service.py and write the minimal code for is_loan_application_approved to make it pass, using get_credit_score.”
AI-Generated loan_service.py:
from credit_service import get_credit_score
def is_loan_application_approved(user_id: str) -> bool:
score = get_credit_score(user_id)
return score >= 700
Action: Run pytest. It passes! And importantly, we did not see our warning message in the console. Our mock worked, and the AI-generated code passed our spec. We are Green.
This example showcases how TDD, combined with AI, provides both correctness and resilience – even when external dependencies fail or behave unexpectedly.
Cycle 2: The “Exception Path” (Service Fails)
This is the real test of robust code. What happens if the credit service is down?
Red Phase: AI Mocks an Exception
Your Prompt: “Add a new test: test_application_rejected_if_service_crashes.
- Mock get_credit_score to raise an exception using side_effect=Exception(‘Service Down’).
- Assert our function catches this and returns False (it should not crash).”
AI-Generated test_loan_service.py:
# ... (previous test) ...
def test_application_rejected_if_service_crashes(mocker):
mocker.patch(
'loan_service.get_credit_score',
side_effect=Exception('Service Down')
)
# We expect our code to catch this and return False
assert is_loan_application_approved('user-789') == False
Action: Run pytest. The test fails with Exception: Service Down. Our function doesn’t have error handling and crashes. We are Red.
Green Phase: AI Writes Robust, Error-Handling Code
Your Prompt: “The new test test_application_rejected_if_service_crashes is failing. Update is_loan_application_approved to catch any Exception from get_credit_score and safely return False.”
AI-Generated loan_service.py:
from credit_service import get_credit_score
import logging
def is_loan_application_approved(user_id: str) -> bool:
try:
score = get_credit_score(user_id)
return score >= 700
except Exception as e:
# AI correctly adds the try/except and logging
logging.warning(f"Credit service failed for {user_id}: {e}")
return False # Gracefully return False
Action: Run pytest. All tests pass!
The Takeaway: Context is Everything
By using a TDD approach with your AI coding assistants, you’re doing more than just generating code: you’re working in a highly resilient way based on solid engineering principles. TDD may have been around for a while, but it excels in an AI-driven workflow: shifting the focus from fixing unpredictable outputs to defining the right behaviour upfront, giving AI the context it needs to deliver faster and more accurately.
At Nimble Approach, we help teams adopt and embed practices like TDD into their everyday development workflows, enabling them to move quickly with confidence. Find out how we can transform your testing approach and accelerate delivery here.














