As of today, test-driven development (TDD) is an integral practice in many software projects. However, this practice is still difficult to master and presents significant challenges and risks if not applied right. In this article we discuss the idea of test-driven development in general, present some common pitfalls to avoid, and eventually discuss the impact of test-driven development on design. The lessons learned are based on our own experience that we acquired in the course of software projects conducted in the last years.
Part 1: Theory
First, let us briefly review what test-driven development is all about. To keep the article short and on point, we will focus primarily on implementation and ignore other activities like requirements engineering, documentation, etc.
In a sequential predictive process, the simplified ideal life cycle for a software feature looks like this: first, you implement it, then you test it, fix detected bugs, and then you are done:
The problem with this approach is that it is generally very hard to predict how long a complex activity – like much of software development is – will take. When the deadline is approaching and development is taking longer than planned, there are basically two options: move the deadline, or cut testing activities. Usually, under time pressure, a decision is made to drop some testing, deliver what is available (promising to fix eventually discovered bugs later) and hope for the best:
We all know how this plays out. Essentially, this fixes scope and leaves quality variable. In the long term this leads to accumulation of bugs and technical debt, which, left unchecked, leads to degradation of team performance and morale. As the technical interest payments rise, eventually, the process breaks down because new features become too difficult to implement. At this stage of the project, life is hell.
How test-driven development helps
To avoid this kind of problems, test-driven development makes the following changes to the process:
- split the process into many short micro-iterations
- in each micro-iteration write test code before writing implementation code, make sure all tests pass, and refactor “mercilessly” to keep the design malleable
Since the test code is written first and the objective is to make and keep the tests green at all times, the development is said to be test-driven. Essentially, this trades fixed scope for fixed quality: if a feature cannot be delivered under given time constraints, it will be naturally de-scoped. This is usually OK, since most of the features will be completed in time and it is preferable to deliver some of the features with confidence that they are implemented correctly than trying to squeeze all the features in but failing to ensure consistent quality.
These micro-iterations should be really short; in fact the whole process can be viewed as consisting of three continuous sub-processes running in parallel:
These sub-processes form a symbiotic relationship, constantly affecting each other, which ultimately results in fewer defects, better design and higher productivity. This is the main value proposition of test-driven development.
In its radical form, test-driven development demands that absolutely no production code be written without having a failing test first. Though this will certainly ensure perfect coverage, we believe that it can be relaxed and that it is sufficient to require that:
- eventually all production code is covered by automated tests
- the development of production code and test code proceeds in parallel
- continuous refactoring is part of the routine
- test code receives the same treatment as production code
Failing to adhere to these practices will increase the risk of getting the overhead while not getting the promised benefits. In the next parts we will be looking at more concrete and subtle issues which we encountered adopting this method in our projects.
Part 2: Pitfalls of test-driven development
When we wrote our tests, we observed that despite high coverage there was a large number of defects that were not discovered by tests and surfaced only after deployment to production. This is obviously a problem since the ultimate goal of testing is to detect defects early! Furthermore, we discovered that the code quality did not increase as we expected. Drilling down on the actual causes of this, we discover three patterns.
There has not been a clear understanding and specification of intended behavior of the units under test. The fixtures have been constructed based on assumptions which simply did not hold under real production conditions. For example, it was assumed that the input data to the system would be of higher quality than was actually the case, which caused many unexpected failures in production.
During testing, the units under test did in fact exhibit erroneous behavior, but the fixtures were not able to detect it. This indicates that having a high level of test coverage alone does not imply effectiveness of the test suite. Furthermore, it becomes evident that designing effective tests is indeed a challenging discipline, and requires the same level of care and thought as developing production code. Which leads us to the third issue.
Treating test code differently
This is perhaps the biggest problem of all. When we wrote our tests, we observed that the test code was designed differently than production code. Best practices usually applied to implementation design were not applied as rigorously to test code. For example, in comparison to production, duplication and coupling in the test code base were much higher. One possible explanation for this is the idea that contrary to production code, test code will only run during development and therefore it will never be exposed to real users. This is short-sighted because test code is an integral part of the whole code base and needs to be evolved during the life-cycle of the project. We will discuss this issue and further implications on design in the next part.
Part 3: Impact on design
In theory, test-driven development should have a positive impact on overall design quality, but contrary to our expectations, we observed that it has not always been the case. In some instances, automated tests even made it more difficult to improve the design, so there is something interesting going on. Again, drilling down on the actual issues, we identified two aspects.
This may sound trivial, but effective tests take a lot of time to write. This leads to the following problem: simplistically speaking, since the amount of available time is limited, the more effort a developer puts in creating test fixtures, the less time is left for exploration and evaluation of design options.
Although test-driven development is a design practice, to be effectively applied, it requires thinking of test code not only as a verification mechanism but as a dedicated design tool. Effective design requires exploration, experimentation and iteration. In practice, however, test code is often written just to verify the implementation “as is”, implicitly assuming that it is fixed, instead of being just a transient point in the design space, which can be moved at any time during the process.
Writing test code alone does not magically increase the quality of the implementation. Applied sensibly, is has the potential to facilitate that goal, but applied mechanically, this could even lead to detrimental results. For example, bad testing code could inhibit refactoring by increasing the coupling of the whole code base. When this happens, the test code becomes the bottleneck: changing some aspect of the implementation leads to breakage of many tests at once. Thus, bad test quality has a direct impact on implementation quality! You cannot consider them separately.
On the other hand, focusing on getting the design right might result in an implementation that requires only a fraction of testing. If this seems implausible, consider very strictly typed and functional programming languages, for example. The programs written in such languages often require much more effort to get done right, but when the compiler is satisfied, they mostly “just work”. Unit testing such programs will not create as much additional value as those written in regular programming languages because large parts of the requirements and specification will be encoded in the types, and a powerful compiler will be able to verify the correctness of the implementation to a large extent by checking whether the types align. In this case, creating the right types is an essential design activity that minimizes the need for testing, but this principle scales to other design practices in general as well.
We believe that in order to actually create the synergy of the test-code-refactor cycle described in the first part, some basic principles must be obeyed, otherwise there is a significant risk of not getting the desired results or even obtaining detrimental results. Additionally, we discovered that paying attention to the following ideas is helpful:
First, assume variability per default. Trying to get the design right the first time is impossible. Therefore, tying the test code too much to a bad design will only hinder refactoring and severely limit agility. Remember that requirements and your understanding of the domain will change, so be prepared.
Second, iterate often. Never assume that after implementing, testing and refactoring some feature, you are done. In software development the most important activity is understanding the problem, and the more you iterate, the more you learn about the problem. Obviously there is a trade-off, and you should definitely stop and move on when the current design is “good enough”. However, in practice, we observed that especially under time pressure, we tend to gravitate towards the other side of the spectrum. In the extreme case, we stop with just the first idea that comes to mind, implement it, and never touch that code again.
Having a comprehensive and passing test suite does not indicate architectural integrity of the system – only careful analysis and understanding of the problem domain and proper design can do that.
We discovered that there is sometimes a troubling misconception that having a comprehensive and passing test suite would indicate good design. The problematic reasoning is this: “we have a lot of tests and they are all green, therefore everything is fine”. Although in theory, any software developer will surely understand and agree that “you cannot test quality into product”, this truth sometimes gets ignored in practice.
In fact, these two concerns (test coverage and product quality) are independent: one can easily imagine a high-quality system running in production, delivering a lot of value to its users, and not having a single automated test at all – just delete all the tests after deploying and completing extensive validation of the system. On the other hand, it is conceivable to achieve 100% test coverage and have an extremely brittle or unusable system; in this case the tests do not provide any value at all, and are just as useless as the system itself.
Furthermore, the test code itself does not deliver any value to the user of the system; it is the execution of the test suite and the act of writing test code that create value, but once the tests are executed and the system is designed, deployed and put to use, they can be safely deleted. In other words, after developing the system and verifying that all tests pass, running the same test suite again generates no additional information about the system.
Of course, this view of the testing process is very simplistic, as the system usually must be continuously evolved in order to meet the ever-changing user needs, in which case building and maintaining a comprehensive test suite provides a safety net protecting from regression, and continues to assist the design process.
In total, we believe that the problems described above stem mostly from the fact that we did not consider test and implementation code equally. As mentioned in the first part, this is key for successful application of test-driven development in practice. However, it turned out to be very difficult to do that, because you really need discipline to follow through.
Test-driven development is an integral technique for achieving high quality and is part of everyday practice of most software developers today. However, it is still a challenging discipline that takes time and practice to master. You also need to pay attention to best practices and be very disciplined, otherwise you risk wasting time and not getting the expected benefits. Additionally, in order to be effectively applied, writing test code needs to be explicitly considered as design tool, and not just as verification mechanism. Finally, this practice can only complement other design activities, not replace it.