Overview

Writing Better Tests With JUnit

No Comments

TLDR;
Writing readable tests is at least as important as writing readable production code. But the standard JUnit tooling won’t help us. In order to create a readable, maintainable, useful test suite, we need to change our testing habits. The focus must be on behavior, instead of implementation.

If you’re a developer, and you’re even remotely like me, you’re writing tests. Lots of tests. In fact, you write tests for everything: Unit tests to verify your classes do what you intend them to do. Integration tests to check your modules and configurations. Acceptance tests to prove that the system has exactly the features you were asked to implement. UI tests, smoke tests, etc. etc.1

Most of the time, these tests aren’t the actual focus of our development; the system under test (SUT) is. The tests just “happen” along the way. And just because we think so much about the production code, its architecture, method and variable names, even whether to put opening braces on the same line2 or on the next – just because we put so much effort into making that code “great”, we often neglect to apply the same care to our tests. Tests, however, also contain code. Code that contains tremendous value: It represents our knowledge about what the system is supposed to do. It is, in fact, in some ways even more valuable than the production code itself.3

The Value of Test Code

Writing fast, bug-free code is easy, if it’s a) not too complex and b) written once, then never touched again. Of course, if you’re working on anything other than a throw-away mobile game, this is never the case. So in order to keep your code as high quality and maintainable as possible, you need tests. Tests allow you to add new features, without breaking what’s already there. Tests help you to make changes in your architecture, without damaging behavior. Tests enable you to find newly introduced bugs early and with little additional effort. And of course, you knew that already…

But tests also serve another purpose.

Have you ever worked on a team, where you had to integrate a new colleague half-way through the project? Easy, right? Just give her the link to the wiki and a comprehensive list of word documents and UML diagrams, and she’ll be on track in no time… Except for she won’t: Because documentation rots even faster than code without tests. Documentation often remains in its original state, even if the system changes dramatically. I would even dare to say that most documentation is already outdated, the second it is written. And even if your IT architect probably disagrees: UML diagrams say little about what actually happens in a running system, however well they are drawn.

Well-written tests, on the other hand, will tell you all of these things (and more):

  • How to access the API
  • What data is supposed to go in and out
  • What possible variations of an expected behavior exist
  • What kind of exceptions might occur, and what happens if they do
  • How individual parts of the system interact with others
  • Examples of a working system configuration

And last, but most definitely not least:

  • What the customer expects the software to do

The value of tests as a form of living documentation can not be overestimated. Especially on larger-sized or long-running projects, a good test suite will not only help onboarding new team members, but also when revisiting older parts of the code base, when reviewing someone else’s code4, or when looking for the occasional bug that somehow made it through the safety net.5

The problem with JUnit

One of the oldest and arguably most widely used test frameworks around is JUnit, originally written by Kent Beck and Erich Gamma in 1998.
It is the Java version of SUnit, one of the first unit testing frameworks, and the “mother” of the xUnit family. It was written with a very simple, rather technical concept in Mind: Individual test cases, with verifiable test results, organized in test suites. It does not, however, include a manual on how big or small each test case should be. Nor does it provide the means to take care of documentation. With just JUnit’s built-in features, these problems remain elusive, and it is up to the programmer to come up with a solution – or not.

To add to the confusion, many of the IDE plugins supporting JUnit project a very one-dimensional idea of a test case: When you create a class in eclipse or IntelliJ, you are easily directed to tools that create a “matching” test, i.e.: A test class that contains stubs to call each public method on your production class, and optionally includes setUp() and tearDown() methods, which are run before and after each individual test stub, or before and after all of the tests.

For, example, if you had a class like this:

public class MyFancyClass {
  public boolean hasFancyProps() {
    return true;
  }
 
  public void myFancyMethod() {
  }
}

You’d be offered this template JUnit 4 test case:

public class MyFancyClassTest {
  @BeforeClass
  public void setUpClass() throws Exception {
    // run once before any of the tests
  }
 
  @Before
  public void setUp() throws Exception {
    // run before each test
  }
 
  @Test
  public void testHasFancyProps() throws Exception {
    // call the hasFancyProps() getter
  }
 
  @Test
  public void testMyFancyMethod() throws Exception {
    // call the myFancyMethod() method
  }
 
  @After
  public void tearDown() throws Exception {
    // run after each test
  }
 
  @AfterClass
  public void tearDownClass() throws Exception {
    // run once after all tests
  }
}

Why is this not helpful? Because while “provide one test stub for everything in the public API” seems like a smart enough concept, it completely ignores both the fact that any moderately sophisticated class reacts differently within different surroundings (i.e.contexts), and the possibility that someone else might need to read and understand what exactly this test does.

What We Can Do

Above all, obviously, we should always apply the same care to our test code that we do to our production code. That means constant refactoring, removing code duplication, keeping methods short and readable, applying the SOLID principles, using comments only when absolutely necessary, etc.etc.

In addition, there are some measures that apply specifically to testing.

1. Test behavior, not implementation

If we set our testing focus on implementation details (e.g., “did we call this method”, or “did we set this variable value”), we create fragile tests: Any time we change even little things in the production code, the tests will break. We will have to re-evaluate our logic every time: Did it break, because we changed how values are stored? Are the expected values still valid? Do we have to access different methods/variables to get the correct results? In short: Does the test need to be changed, because it no longer checks what we expected it to check? Or did it actually break, because we broke the algorithm?

Just by considering these questions, it should be easy to see that repairing brittle tests requires much additional effort, thus making code changes hard and tedious (not to mention: expensive). What we really expect from a test suite, though, is that it should enable us to change our code, not hinder us from doing it!

To overcome the brittleness we need to change our tactics: Instead of checking implementation, we must focus on behavior. Behavior is, by definition:

[…]the range of actions and mannerisms made by individuals, organisms, systems, or artificial entities in conjunction with themselves or their environment
from Wikipedia

“Range of actions and mannerisms” – this explicitly limits our view to what is observable from the outside. If we refrain from disclosing internals, and phrase our tests accordingly, they should become much more flexible, and enable us to refactor, replace and/or rewrite large parts of the production code without additional effort – a true “safety net” that we can rely on. Once implemented, they should always remain green, unless the behavior changes, and therefore only turn red during refactoring, if we made a mistake.

More recent testing frameworks, which were born in the wake of Behavior Driven Development, such as Ruby’s RSpec, JavaScript’s Jasmine and the like, already have a strong focus on behavior and documentation ingrained: Their syntax allows actual text as a way to describe what is happening, and why. Obviously, JUnit lacks a similar mechanism. But fortunately, we can still borrow some concepts and vocabulary to rephrase our own test methods.

BDD expects requirements to be written in the form of “Given-When-Then” statements, i.e.:

Given a precondition
When a thing happens
Then a result should be observable

This represents the so-called “Triple A” pattern: Arrange the preconditions and inputs, Act on the test object, then Assert the results.

We can easily apply this style to JUnit tests, by simply renaming our test methods in the Given-When-Then syntax:

public class MyFancyClassTest {
  private MyFancyClass sut;
 
  @Test
  public void givenAFreshFancyClass_whenCallingFancyTestMethod_shouldHaveFancyProps() throws Exception { 
    sut = new MyFancyClass(); // precondition
 
    sut.myFancyTestMethod();  // thing happens
 
    assertTrue( sut.hasFancyProps() ); // correct result: true
  }
}

Of course, this quickly leads to very long method names (we’ll fix that soon, I promise), but it enables us to think about our tests in a better way, and we can now describe what the test actually does. Great!
Readability can be improved by extracting the “arrange” and “assert” sequences into individual methods, like so:

public class MyFancyClassTest {
  private MyFancyClass sut;
 
  @Test
  public void givenAFreshFancyClass_whenCallingFancyTestMethod_shouldHaveFancyProps() throws Exception { 
    givenAFreshFancyClass();
 
    sut.myFancyTestMethod();  // thing happens
 
    assertFancyProps();
  }
 
  private void givenAFreshFancyClass() {
    sut = new MyFancyClass(); 
  }
 
  private void assertFancyProps() {
    assertTrue( sut.hasFancyProps() );
  }
}

Not only will this help to understand the test, it also makes its code more reusable: We can structure and group setup procedures by extracting more “given” methods, and we can reuse complex assertions by parameterizing the extracted methods. You’ll also notice that this way of phrasing makes it much easier to come up with the next test to write, when you’re going test-first: After all, we’re simply stating what we expect the class to behave like, given one or more preconditions – that’s not a hard thing to think about, and it can easily be done “before the fact”.

But if you start structuring your tests in this manner, you’ll also notice some effects on your production code: Class method names will be directly related to the expected action they should trigger, getter names will represent expected results. You will write code that produces less side-effects. It will be easier to apply the Single Responsibility Principle, find the right abstractions to use, where to extract or move things around. And you will be able to grasp the meaning of things you wrote last week (or last month) much more quickly than before.

2. Group tests by Context

The preconditions that are required for a specific behavior to occur can be considered its context. In order to keep our tests organized, we should try to make sure that test methods, which focus on behavior within the same context, should be grouped closely together. This helps us to find each individual test, and to understand the behavior of the system as a whole more quickly and easily.

One really good way to do this is by using Stefan Bechtold’s HierarchicalContextRunner. It uses inner classes to structure tests into a tree of contexts, both allowing more fine-grained setUp() and tearDown() methods, and shorter method names, while keeping things readable. Here’s an example:

@RunWith(HierarchicalContextRunner.class)
public class MyFancyClassTest {
  private MyFancyClass sut;
 
  @Before 
  public void setUp() throws Exception {
    // runs before each test, maybe calling static initializers
  }
 
  @Test 
  public void shouldTestSomeBehaviorUnderAnyCircumstances() throws Exception {
    // runs without further context   
  }
 
  public class GivenAFreshFancyClass {
    @Before 
    public void setUp() throws Exception {
      // runs after the outermost setUp() method
      sut = new MyFancyClass(); 
    }
 
    public class WhenCallingFancyTestMethod {
      @Before 
      public void setUp() throws Exception {
        // runs before each test within the context, 
        // after the outermost setUp() 
        // AND the one in GivenAFreshFancyClass
        sut.myFancyTestMethod();
      }
 
      @Test
      public void shouldHaveFancyProps() throws Exception { 
        assertTrue( sut.hasFancyProps() );
      }
    }
 
    public class WhenCallingOtherFancyTestMethod {
      @Before 
      public void setUp() throws Exception {
        // runs before each test within the context, 
        // after the outermost setUp() 
        // AND the one in GivenAFreshFancyClass
        sut.myOtherFancyTestMethod();
      }
 
      @Test
      public void shouldNotHaveFancyProps() throws Exception { 
        assertFalse( sut.hasFancyProps() );
      }
    }
  }
}

Grouping tests this way is very powerful: It reduces the amount of code in setUp() and tearDown() to what’s actually different between contexts, makes method names more readable, removes the need for private helper methods, and even allows for code folding, as well as a tree of test results, e.g. in IntelliJs JUnit Runner window:
Screen Shot 2016-01-04 at 15.55.36
Granted, this sample doesn’t quite capture the extent of how much more readable the tests become. Let me assure you: Especially when you’re working with third party code, and your tests require a lot of mocking and stubbing, you will absolutely love HierarchicalContextRunner.

3. Enforce the Single Assertion Rule

One of the most common test anti-patterns is the “Free Ride”, a.k.a. “Piggyback”: A second assertion “rides along” in an existing test method, rather than prompt the creation of a new test. Not only does this obscure the intention of the test, it also leads to less valuable test results: Why did the test break? Can we tell from one glance, or do we have to check line numbers to figure out which assertion failed?

We should always try to limit each test method to a single assertion. However, this does not necessarily mean there can only be one call to Assert.assertEquals() or the like – we can group several statements that belong together semantically into a single assert method of our own design. This also makes our code more readable, because it assigns meaning to an anonymous block of assert statements, which we might have missed or written a comment for, otherwise.

4. Choose meaningful names

Obviously, we should apply the same care to choosing variable and field names, that we apply to choosing method names. This means we should under any circumstances avoid “empty” names like a, test1, data or the like, and instead try to find names that actually explain the meaning of the things we pass around, such as userWithoutPassword, requestWithoutHeaders, and so on.

This will further eliminate the need for comments, as well as require us to think about when and where we create our test doubles and data containers.

5. Avoid complex configuration

In order to keep our tests fast and snappy, we should try to avoid overly complex or bloated configuration. This applies especially to extensive use of dependency injection frameworks like Spring. Just adding the @RunWith(SpringJUnit4ClassRunner.class) annotation will easily increase execution time for each of your test cases by as much as a second. This may not seem much, but in a large suite with thousands of tests, it adds up to a significant amount of time. Maintaining several configuration files for production and test code also increases the amount of work it takes to implement changes and keep things clean: These configurations often develop a life of their own, where obsolete bean configurations continue to exist, beans exist in different scopes, unexpected side effects are introduced, because several tests “reuse” the same configuration file, etc. etc.

A simple way to get around this problem, at least for unit tests, is to use Mockito‘s @InjectMocks annotation to – well – inject any mocks you have configured into the class under test’s fields. This significantly reduces execution time, compared to using Spring’s JUnit runner, and your mock configuration actually ends up within the test, instead of somewhere else on the class path.

An even better way to do it is to declare dependencies explicitly, i.e. have a dedicated constructor or setter methods that you can call from your tests without having to use a third party framework at all.6

6. Avoid test inheritance, if possible

Having many similar test cases often brings along the question of code reuse. And it is a good one… Shouldn’t we try to avoid duplication? Shouldn’t we make sure to use all the OO goodness at our disposal? After all, copy-pasting code from setUp() is just as bad as copy-pasting production code, isn’t it? Wouldn’t we have to maintain the same code in many places?

However valid these points may be – this is the one case where I would consider it a good idea to forgo DRY in favor of keeping tests as decoupled and independent as possible. If our tests depend on each other, we make it harder and more tedious to change our system. Just think about it: If we wanted to change the class hierarchy of our production code, we might suddenly have to do extensive refactoring of our tests. We don’t want our tests to introduce additional dependencies to our code base – we want them to help us get rid of those!
We can reduce the amount of duplicate code by using other means: We could extract extensive setUp code into helper classes, or use creational patterns to produce our test doubles and data objects, for example.7

Inheritance also makes the tests harder to understand. We will at least have to navigate the class hierarchy to understand what is happening, not to mention all the possible confusion inheritance brings along: Methods may be falsely overridden (see Liskov Substitution Principle), there may be visibility issues, name shadowing, etc.etc.

And finally, inheritance also introduces a bunch of performance problems, specifically when working with JUnit.

As usual, there is an exception, though: If you need to test several concrete implementations of the same abstract class, I would consider it useful to mimic the same inheritance structure in your test code, i.e.: Create an abstract test case to do common setUp and tearDown, and to cover all the functionality of the abstract class, and then extend concrete tests for each implementation class from it.

Wait… That’s all?

Of course not. There are lots of things I didn’t cover in this blog post: Which assertion framework to use, when to use which kind of test double, how to provide readable failure messages, … That’s all very exciting, and I am quite sure there will be at least one more article on testing, soon. Until then, I am very much looking forward to reading your comments and suggestions. 🙂

Footnotes

  1. There are, of course, many other kinds of tests. But since this post centers around tests that could or should be written with JUnit, I’ll spare you the comprehensive list.
  2. This is not intended to be a post about code style. But you should really, always, no exceptions, put opening braces on the same line. Seriously.
  3. Should you ever lose your entire production code base (in the rather hypothetical event of, say, a strangely selective failure in your versioning system), it is not inconceivable that you will be able to recreate all of it in a very short period of time, if your test coverage is good: By going through the test suite and making all the little red lights go green again, you get a nice step-by-step guide to reimplementing the SUT. You may have guessed it: This doesn’t work the other way around. Extrapolating developers’ intentions and overall system concepts from production code is often tedious and takes a very, very long time.
  4. One of the most practical and efficient ways of keeping your code quality high is peer review. If you’re not doing this already, you want to start now.
  5. You can set breakpoints in your test code and step through the corresponding production code in the debugger. Try that with a UML diagram, I dare you.
  6. I know quite a lot of people who would argue vehemently against creating constructors or accessors “only” for use within tests. I, personally, don’t see anything wrong about creating explicit API that enables a more straightforward configuration. Moreover, there are quite a few people who would argue against relying too much on dependency injection frameworks. I suppose, the truly correct answer is the same as always: It depends™.
  7. This also applies to most of our production code, by the way.

Comment

Your email address will not be published. Required fields are marked *