Skip to main content

By George Verney, Head of the Data Capability at Nimble Approach.

What is Test-Driven Development (TDD)?

Before we dive in – what is this thing?

Test-Driven Development (TDD) is a software approach where tests are written before the actual code.
By writing tests first, developers are forced to think about the requirements and functionality they are implementing, leading to more robust and reliable code.

I find that TDD is a little alien to most Data Engineers – and under used by most software engineers as well!

I started as a skeptic, too. Particularly when wearing my “data hat” 🥳

But the landscape of data engineering is changing, and I think the broad adoption of languages such as Python, R and Scala for our work at scale is adding emphasis to the engineering aspect of our roles more and more.

How I hope you feel by the end of this article 🤩

A Simple Example of TDD to Illustrate

Your Product Owner brings you some feature requirements for everyone’s favourite subject: dogs! 🐶

Dogs are wonderful. They come in all shapes and sizes. Tiny pups are still learning to howl, older dogs like to boof under their breath, other dogs simply woof, while others are quiet.

So you start to think about modelling your dogs as an object. In this exercise we’ll be using Python.

# dog.py
from dataclasses import dataclass

@dataclass
class Doggo:
    name: str
    age: int

    @property
    def sound(self) -> str:
        if self.age <= 1:
            return "Awoo!"
        if self.age > 8:
            return "Boof."
        return "Woof"

And because we’re good engineers, we’re going to write some tests for our Doggo object…

# test_dog.py
def test_doggo_sound_puppy():
    expected = "Awoo!"
    actual = Doggo(name="Layla", age=1).sound
    assert expected == actual, "Puppies howl cutely!"

def test_doggo_sound_senior():
    expected = "Boof."
    actual = Doggo(name="Charlie", age=14).sound
    assert expected == actual, "Older dogs boof under the breath"

def test_doggo_sound_default():
    expected = "Woof"
    actual = Doggo(name="Bob", age=7).sound
    assert expected == actual, "Your average doggo woofs"

100% test coverage – aren’t we amazing! 🤘 Smash this in to a pull request and bask in the glory of another perfect lump of code… right? Close.

What’s wrong? All my tests pass

True. All your tests pass. Your code has 100% coverage. Your peers find no fault during review and you get the code merged without issue.

But you’ve missed something in your rush towards perfection…

Let’s revisit the requirements:

Dogs are wonderful. They come in all shapes and sizes. Tiny pups are still learning to howl, older dogs like to boof under their breath, other dogs simply woof, while others are quiet.

I was blinded by writing code about my favourite subject that I’m an expert in that I missed something 😳

For instance: did you know that basenji have the nickname of “barkless”?

Let’s try again, using TDD

If we shift our mindset to writing the tests first, we are much more likely to consider all the requirements.

Here we go again – this time starting with a “stub” class, then the tests before the implementation code.

# dog.py
class Doggo:
    @property
    def sound(self) -> str:
        Pass

# test_dog.py
from unittest import mock

def test_doggo_sound_puppy():
    expected = "Awoo!"
    actual = Doggo(age=1).sound
    assert expected == actual, "Puppies howl cutely!"

def test_doggo_sound_senior():
    expected = "Boof."
    actual = Doggo(age=14).sound
    assert expected == actual, "Older dogs boof under the breath"

def test_doggo_sound_default():
    expected = "Woof"
    actual = Doggo(age=7).sound
    assert expected == actual, "Your average doggo woofs"

def test_doggo_sound_quiet():
    expected = None
    actual = Doggo(age=mock.ANY).sound
    assert expected == actual, "Some dogs don't make noise"

Note the last test case. We don’t know how we’re going to implement it yet, but we know that it’s a scenario we need to cover.

Obviously none of these pass right now – but once they all do, we know our doggos are behaving correctly!

Here’s one potential approach:

from dataclasses import dataclass

@dataclass
class Doggo:
    name: str
    age: int
    is_mute: bool = False

    @property
    def sound(self) -> str | None:
        if self.is_mute:
            return None
        if self.age <= 1:
            return "Awoo!"
        if self.age > 8:
            return "Boof."
        return "Woof"

This requires a minor update to the final test case, namely adding a new keyword argument to that instance:

def test_doggo_sound_quiet():
    expected = None
    actual = Doggo(age=mock.ANY, is_mute=True).sound
    assert expected == actual, "Some dogs don't make noise"

So why should you care about TDD? Lessons learned

The biggest shift for me wasn’t technical – it was mindset.

Prioritising your test cases first will help you:

  • Build the right thing first time round (lead by the acceptance criteria)
  • Only write the code that is needed (avoid rabbit holes on features not in the spec)
  • Focus on a modular design (encourages simpler to test code in a workflow that narrows focus to each test case)
  • Increase code quality (helps to follow the K.I.S.S. principle and gives you “self-documenting” code via the test cases)
  • Reduce risk of regressions (test coverage FTW!)
  • Save you time and effort on refactoring (your existing tests protect against regressions during refactoring)
  • And most importantly: still allow you to write and ship awesome code! (don’t forget that “test code” is still code… you just might not love it yet)

The end result is that your users will be happier as you’ve produced a rock-solid product that matches the user’s asks! 🎉

Next Steps

Now you’re fully on-board, where do you go next?

Practice

  • A FizzBuzz coding Kata is a classic problem you’ve probably come across before; but how about you write your tests first?
  • Advent Of Code – a fun, annual, series of coding challenges that really lend themselves to TDD (they give you a bunch of test scenarios in the question!). The first puzzle is a great place to start.

Python Libraries

  • Is it bad form to suggest RTFM? The Python standard library unittest module is probably my most visited resource for all Python development!
  • Despite the above; I’m a pytest-by-default user now (largely due to fixturing)
  • Start reporting on your code coverage (coverage|pytest-cov) to know where you’re currently at.
  • Consider diff-cover for new work, ensuring your additions and changes have tests.

Want to bring TDD practices into your data workflows?

At Nimble Approach, we help teams embed quality engineering at every stage, from pipelines to platforms.

👉 Get in touch with us to find out how we can support your team.