Overview

Sensible mutation testing: don’t go on a killing spree

No Comments

This is a follow-up to my earlier post about mutation testing (MT). To recap: MT helps you ensure that your unit tests are any good. The framework manipulates your compiled code by inserting small changes (mutants). It then re-runs your tests and expects the unit tests that touched the mutated code to fail. A failing test means the mutant was killed. This implies that your test suite not only covered the affected lines in the source code, but also threw an assertion. As you all know, test coverage means nothing without proper assertions.

Let’s have look at a concrete sample of a very simple business rule. You can find the code on https://github.com/jaspersprengers/pitestsample.git

public class Bouncer {
  enum Gender {
    MALE, FEMALE;
  }
 
  public boolean welcome(int age, Gender gender) {
    if (gender == Gender.FEMALE) {
      return age >= 18;
    } else
      return age >= 21;
    }
  }
}

Our bouncer lets in only men over 21 and women over 18. Now consider the following two unit tests:

@Test
public void test21YearOldMale() {
  assertThat(filter.welcome(21, MALE)).isTrue();
}
@Test
public void test19YearOldFemale() {
  assertThat(filter.welcome(19, FEMALE)).isTrue();
}

And let’s run the tests with coverage:

Screen Shot 2016-02-25 at 11.32.03
Cool! We have 100% rock solid coverage with only two tests. Let’s see what PIT has to say about this brave effort. If you run

mvn -DwithHistory clean install org.pitest:pitest-maven:mutationCoverage

on the command line from the project root you can find the test results in target/pit-reports.

PIT test coverage report

No so great… PIT inserted 19 mutants, of which seven were left to survive, despite our 100% coverage. Let’s take a detailed look.

Screen Shot 2016-02-25 at 10.47.48

It should not come as a surprise that our test suite is sadly inefficient:

  • We did not allow for the gender argument to be null.
  • We did not check for edge cases. Taking into account the nearest integer values to 18 and 21 (the values used in the comparisons) means we should check for 17,18,19,20,21 and 22.
  • In combination with the three possible values for gender this means we have 18 possible scenarios to test.

So to outsmart PIT we need to test for all these edge cases:

@Test
 public void testAllMales() {
 assertThat(filter.welcome(17, MALE)).isFalse();
 assertThat(filter.welcome(18, MALE)).isFalse();
 assertThat(filter.welcome(19, MALE)).isFalse();
 assertThat(filter.welcome(20, MALE)).isFalse();
 assertThat(filter.welcome(21, MALE)).isTrue();
 assertThat(filter.welcome(22, MALE)).isTrue();
 assertThat(filter.welcome(17, FEMALE)).isFalse();
 assertThat(filter.welcome(18, FEMALE)).isTrue();
 assertThat(filter.welcome(19, FEMALE)).isTrue();
 assertThat(filter.welcome(20, FEMALE)).isTrue();
 assertThat(filter.welcome(21, FEMALE)).isTrue();
 assertThat(filter.welcome(22, FEMALE)).isTrue();
 assertThat(filter.welcome(17, null)).isFalse();
 assertThat(filter.welcome(18, null)).isFalse();
 assertThat(filter.welcome(19, null)).isFalse();
 assertThat(filter.welcome(20, null)).isFalse();
 assertThat(filter.welcome(21, null)).isTrue();
 assertThat(filter.welcome(22, null)).isTrue();
 }

If we do that and rerun PIT all mutants are killed:

Screen Shot 2016-02-25 at 10.49.40

This example clearly shows how treacherous 100% coverage can be and how laborious it is to cover all edge cases and kill all mutants. We needed 18 test assertions to cover four lines of not very difficult code. Which begs the question…

Should all mutants be killed?

The short answer: no.

The long answer: it depends. It’s like insisting on 100% test coverage. Some code is more complex than other and is therefore more prone to errors. Complex algorithms and regular expressions are different from boilerplate statements like:

try {
  connection.close();
} catch(SQLException e){
  complain("checked exceptions: when did it every seem a good idea??");
}

To get 100% line coverage here would mean adding a test scenario that throws an error on connection.close() and verify the call to complain(..). If you haven’t got it covered, PIT will remove the call to ‘complain’ and complain in turn that you didn’t kill that mutant. By all means do, if your time and budget are limitless.

Remember that a mutant is only an instance in which an artificial change to your code was not detected by your tests. It might help you detect bugs in your code, but it doesn’t always make sense to write tests for them. MT can become extremely pedantic if you enable all possible mutation operators. Here’s a few that PITest supplies, but are disabled by default:

  • The constructor calls mutator mutates Thing t = new Thing() to Thing t = null;
  • The Non-void method call mutator replaced a call to that method with null (or the default primitive value). int i = calculate() becomes int i =0;

In the previous example I have enabled all operators that PIT has to offer. You can probably see why some are disabled. To harness against these mutants would litter your test code with superfluous null-checks.

Some code is not worth unit-testing

Feel free to disagree, but I’m thinking of configuration disguising as code: the stuff that used to go in XML, like the beloved Spring Security incantations:

protected void configure(HttpSecurity http) throws Exception {
    http
   .headers().disable()
   .authorizeRequests()
   .antMatchers("/v*/**/*").authenticated()
   .and()
   .httpBasic()
   .and()
   .userDetailsService(userService)
   .csrf().disable();
}

Yeah, you could mock HttpSecurity and all the other convenient ProviderBuilderConfigurerFactories inside it, but you’d do better to invest that time in a good integration test: a test that actually fires up a web container, makes calls through an http client and asserts those results. Not having any unit tests means of course that every mutation to your source code survives, since no test could kill them. Let them live.

Ignore non-business logic with predictable side-effects

Logging statements are somehow external to your business logic: they have side effects depending on various configuration parameters (log level, appenders, etc.) It makes no sense unit-testing those calls and therefore PIT by default does not mutate calls to the most popular frameworks: log4j, slf4j, java.util.logging (sorry: that was never really popular). You can further configure PIT to ignore calls to certain classes. They will nearly always be void method calls, or called in a void context.

Domain logic should not become overly defensive

It is a good architectural principle to make your outward-facing code (e.g. servlets/REST endpoints) extra defensive and picky in what it expects. Faulty user slipping through this layer  is a common source of errors deeper down, and you should allow for many test cases of the “unhappy flow” variety. This is where MT is helpful, by making sure you have covered all the edge cases. Otherwise your inventive users will discover them for you.

Your domain logic should be rigorously protected by this outside outward-facing layer, which makes sure no garbage goes in. If you stick to this principle, runtime exceptions in domain logic should be quite rare. Package-level methods in your domain layer that receive only input from guaranteed (i.e. your own) sources can relax and be more trusting. You don’t need the same lock on your bathroom that you have on your front door.

MT however makes no provision for all this. It is ignorant of the fact that some code receives more garbage and hence has to do a lot more error checking and validation. PIT may insist that you null-check the input to every private method, but you know it makes no sense. Let those mutants live.

Summary

Mutation testing is a very powerful tool to urge you to run that extra mile in unit-testing where it’s important. But it’s up to you to decide which code deserves that extra attention. After all, our time is precious. You could add MT to the suite of continuous integration steps and set a minimum threshold for mutants allowed to survive, but it’s up to you to decide what a sensible threshold would be.

Jasper joined codecentric NL in 2015 but has been coding since the early eighties. Having a background in English and linguistics, he always likes to stress the importance of human language in software.

Share on FacebookGoogle+Share on LinkedInTweet about this on TwitterShare on RedditDigg thisShare on StumbleUpon

Comment

Your email address will not be published. Required fields are marked *