Mutation Testing: Watching the Watchmen

25.1.2016 | 7 minutes of reading time

You can’t do without automated (unit) tests if you want to stay on top of the ever increasing complexity of software projects. A mutation testing framework ‘watches the watchmen’ by inserting small changes into your compiled byte code and then validating your test suites against these intentional bugs. As a quality safeguard it’s much more effective than traditional source code validation. It is a even a challenging way to improve your coding skills and makes writing tests suites fun again.

As a largely self-taught programmer I am still thankful for the millennial internet boom, when a linguistics graduate could get hired with nothing more than a thirst for coding and no professional experience or relevant diplomas to show for. I like to think my coding skills have improved in the fifteen years since, starting out – like most glorified amateurs – with the anti-bureaucratic “just get it done even if it requires duct-tape” attitude of coding, where writing unit tests only gets in the way of shipping code. Read Joel Spolsky about Netscape pioneer Jamy Zawinsky . Nowadays I believe in the school of “You can have good, cheap or fast. Pick two.” I don’t do duct tape anymore, except for small home repairs.

If a job is worth doing, it is worth doing well. Good code written by fallible humans needs automated tests. Too bad it’s not really the most fun part of programming. For many who write code on a daily basis test driven development still doesn’t come natural. Plenty of lip service is paid, but we’re just not that motivated to do it properly unless we’re harassed with minimum thresholds for test coverage. Tests feel like unexciting pieces of code making sure that other code which puts a blue ball in a red box has indeed put a blue ball in a red box, and not a green ball in an orange box. A pedantic bureaucratic requirement. “I can write working code, you know. Let me just get on making customers happy. They don’t care about stupid unit tests”.

That’s a dangerous attitude. Unit tests matter. Integration tests matter even more. You should get motivated. Test cases are a yardstick for quality while you’re designing your code, not something to add when you have time to spare. Code that cannot be properly unit-tested is a likely candidate for some serious overhaul. However, full rewrites are a waste of time and money. Agile development means software is continuously thought out, written down, re-thought and re-written until the user is happy or the money runs out, whichever comes first. Adopting such a just-in-time approach means that a code base is not only being expanded with new code, but the existing code adapts to the more complex software architecture without breaking the functionality it already provides. This change is happening all the time, ideally from day one, but no later than day three. Good tests make this incremental refactoring possible. Bad tests make it impossible.

Greater minds than myself have written great books about the need for refactoring and why meaningful tests should cover your entire code basis. I shouldn’t have to convince you. Let me just add my own argument here:

If you can’t test some piece of code, I don’t trust you to understand it.

Understanding how each line of code relates to the whole is a major challenge. Here’s something to put Moore’s Law into practical perspective. I am writing this post on a fancy Macbook with 16 Gb memory. In the early eighties I saved up my precious pocket money to buy a 16 Kb memory module for my Vic 20 home computer. That’s a million times more memory in thirty years. Human short-term memory is stuck on a measly seven items and has probably been since Socrates. Yes, we have better tools and faster compilers, but we have fixed-size brains that have to deal with ever-more complex projects. The only way to comprehend these is to break them down into manageable, testable chunks.

The size of a hefty chocolate bar

How to make sure our tests are any good? “Code coverage”, I hear you say. True, that’s a useful criterion, usually expressed the percentage with which source code is covered by test suites, but coverage in itself is not enough. Test suites need to pummel your code with a wide range of sensible and wacky input parameters and assert the results. Ineffective, bogus unit tests that cover the source code and please the robot don’t really validate anything. Quantifying quality is not bad in itself, but insisting on a minimum percentage of test coverage without some assurance that your tests are actually any good only lulls you in a false sense of security, particularly if you have too many developers in your team with the duct-tape attitude I just described. If management is okay with the practise, I advise you to find another employer. If you’re okay with it, I urge you to change careers.

You can (and probably should) have dedicated testers that put the finished product through rigorous manual testing. If they are worth their money they will find something that your test suites didn’t catch. But how much nicer it would be if there were an automated way to check if your tests are any good. To test your tests, so to speak. Mutation testing does just that. Mutation testing is based on a simple assumption: if your test suites fully validate the behaviour of your program, then changing the behaviour of the program by inserting significant changes should cause at least some of your tests to fail.

For those who need a car analogy: suppose the car is your source code, the test case involves doing a three-point turn and the JUnit runner is behind the wheel. The test succeeds if the car points the other way unscathed. Mutation testing will do several evil things to your car and expects that your tests will sniff them out. It will remove the battery: did you check that the engine has started. It will skew the rear-view mirror: did you adjust it for a clear view?

The framework inserts small but significant changes (‘mutants’) in your compiled code. Examples are swapping out arithmetic and equality operators, making methods return null or just removing method calls: basically anything that leaves the outward interface intact. Your unit tests should however detect the mutation and fail, thus killing the mutant. If not, the mutant has survived, indicating that your test may be insufficient. Rather than looking at code coverage alone, the percentage of mutants killed becomes the true indicator of quality.

A very useful mutation testing framework for Java is pitest.org . It integrates well with standard build tools and has simple but effective Eclipse and IntelliJ plugins. You can get up and running in a few minutes. Pitest will output a neat HTML report that puts code coverage next to mutation coverage and has a detailed view specifying which mutants have survived, i.e. were not caught by the test suite.

Is that a useful indicator? Can’t you just fool the machine like you can with code coverage? Not really. Mutation testing simply won’t let you get away with sloppy testing. Introducing a small but significant change in a class must break at least one test, i.e. kill the mutant. Mutants are more likely to survive when coverage is poor, but I have found plenty of survived mutants in code I thought was well-tested. The report from the PIT suite neatly puts coverage next to the percentage of mutants killed, where the latter is always a lower figure.

Mutation testing forces you to take testing seriously, there’s no doubt about it. Integrating it into your daily routine sharpens your sensibility to code more defensively and write good tests by adding a competitive element to the mix. Test driven development can feel like playing a game of chess with yourself. You write some code, and then you write a test for it. It’s hardly a victory to see a green test suite. Now when mutants rear their ugly heads writing these tests becomes a lot less predictive, but also more challenging. Any lazy bum can do 100% code coverage. It’s easy. Killing all the mutants in an intricate piece of source code, now that’s a sweet tasting victory.

In a next post, I will go into more detail on how to incorporate mutation testing sensibly and productively.

Was this post helpful?

Likes

Blog author

Jasper Sprengers

Do you still have questions? Just send me a message.

fromJasper Sprengers

Elegant delegates in Kotlin

Kotlin has given us some really killer features . Some are obviously useful (null safety), while others come with a warning, like operator overloading and extension functions. One such ‘handle-with-care’ feature is the language support for delegation...

Kotlin

15.11.2017 | 5 Minuten Lesezeit

Jasper Sprengers

Anti-patterns part 2: Coding is the biggest Golden Hammer of all

In my previous post I explained how software anti-patterns are symptoms of bad habits that can be endemic to entire teams. Today I want to talk about what is perhaps the most infamous of all: the Golden Hammer. Actually, it’s a collection of hammers...

9.10.2017 | 6 Minuten Lesezeit

Jasper Sprengers

When anti-patterns become a pattern

There are plenty of learning resources on software best practices. Sprinkled in between all the well-intended advice are warnings about common pitfalls. We could do with a lot more of these warnings and think about why we keep doing the same things wrong...

27.9.2017 | 6 Minuten Lesezeit

Jasper Sprengers

The most useless knowledge of all

There are things a programmer needs to know, no excuses. There are things you can’t possibly all remember, so it’s fine to look them up when needed. There is the business domain the software touches on that you need to know. And then there’s knowing...

10.9.2017 | 5 Minuten Lesezeit

Jasper Sprengers

Not everything that is vital is also your core business

Large software projects have many vital concerns, such as authentication and authorization. Despite the wealth of available libraries in the Java ecosystem we seem to be re-inventing the wheel far too often. Keep the focus on the core business of your...

Software architecture
Java
IT-Security

16.8.2017 | 5 Minuten Lesezeit

Jasper Sprengers

In defence of pedantic tools

Outline We aim to please the customer at short notice and always overestimate our capacity to comprehend a system as it gets more complex. That’s a recipe for technical debt. The antidote to this psychological shortfall is more team discipline in writing...

Agile methods
CI/CD

2.8.2017 | 7 Minuten Lesezeit

Jasper Sprengers

Mocks or the real thing? Tips for better unit testing

Recently I had to bone up on some of the new features in Mockito 2 and Powermock , though more out of necessity than from genuine curiosity. Powermock and Mockito 2 let you fake static methods, final classes and even constructor calls, but this has ...

Agile
Agile methods

16.7.2017 | 8 Minuten Lesezeit

Jasper Sprengers

Essentialism for developers

Essentialism – the Disciplined Pursuit of Less by Greg McKeown is a book with an essential message: much of life is irrelevant distraction and we would all be happier and more productive if we learned to strive for less, but better. I encourage you ...

2.7.2017 | 8 Minuten Lesezeit

Jasper Sprengers

The vicious circle of bad test code and how to break it

SUMMARY Compared to the great advances in programming languages and tools, the day-to-day practice of how we actually code is messier than it should be. Especially for long running and complex products building things right is just as important as building...

Agile methods
Agile
Testing

6.6.2017 | 9 Minuten Lesezeit

Jasper Sprengers

CRUD operations on Spring REST resources with Kotlin

In this practical, hands-on post I would like to share some of my experience in building REST services wih JSON and Spring(Boot) using Kotlin. All examples can be transferred to Java, and if you use the indispensable Lombok library it doesn’t even look...

Kotlin
API
Spring

2.4.2017 | 7 Minuten Lesezeit

Jasper Sprengers

Integration testing strategies for Spring Boot microservices part 2

This is the second part of my earlier post about strategies for integration-testing Spring Boot applications that consist of multiple (rest) services. You can find the accompanying sample application in my gitlab account: git clone git@gitlab.com:jsprengers...

27.2.2017 | 8 Minuten Lesezeit

Jasper Sprengers

Integration testing strategies for Spring Boot microservices

SUMMARY: Unit tests are a necessary condition to clean code, but today’s convention-over-configuration frameworks like Spring Boot are often used to build applications consisting of multiple services. You need some way of ensuring that the parts are ...

Testing
Microservices

13.2.2017 | 9 Minuten Lesezeit

Jasper Sprengers

Web frameworks and how to survive them

SUMMARY: Frameworks that help build the web apps of tomorrow must keep up with all powerful new technology there is on offer. At some point your application has to adapt, and that is never a painless process. You can avoid a total rewrite however if ...

Angular
Java
JavaScript
Webdevelopment

12.1.2017 | 8 Minuten Lesezeit

Jasper Sprengers

Kotlin’s killer features

SUMMARY: Kotlin is a new JVM language fully interoperable with Java bytecode. It is clearly inspired by Scala, but has a different design philosophy, a much gentler learning curve and some really helpful features like null-safe types. The Importance ...

3.4.2016 | 10 Minuten Lesezeit

Jasper Sprengers

Caching de luxe with Spring and Guava

Summary We generally don’t optimize expensive operations in code until they create a bottleneck. In some of these cases you could benefit a lot from caching such data. The Spring solution is non-intrusive, highly configurable yet easy to set up, and ...

14.3.2016 | 13 Minuten Lesezeit

Jasper Sprengers

Sensible mutation testing: don’t go on a killing spree

This is a follow-up to my earlier post about mutation testing (MT). To recap: MT helps you ensure that your unit tests are any good. The framework manipulates your compiled code by inserting small changes (mutants). It then re-runs your tests and expects...

Testing

26.2.2016 | 6 Minuten Lesezeit

Jasper Sprengers

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Test Fixtures mit JUnit 5

Wir Softwareentwickler leben in einem ständigen Dilemma. Jede Funktionalität der Software sollte durch Unit-Tests und Integrationstest abgesichert werden. Es sollten dabei so viel Tests wie nötig, aber nur so wenige wie möglich geschrieben werden. Schreiben...

Java
Testing
Framework
Softwareentwicklung

25.3.2024 | 7 Minuten Lesezeit

Jens Kaiser

Datenbanken testen mit Testcontainers in Mule4

Hier erfährst du die Möglichkeiten Testcontainers in Mule4 zu nutzen, um deine Datenbankaufrufe zu testen. Vor einiger Zeit hat mein Kollege Christian Langmann eine Blogartikelserie veröffentlicht, in welcher er aufzeigt, wie man in Mule3 Munit-Tests...

Community
Softwareentwicklung
Testing
API
Open Source
Datenbank
Container
Integration

19.1.2024 | 3 Minuten Lesezeit

Benjamin Lüdicke

Goldene Wasserhähne – Wie wichtig ist Qualität in der Softwareentwicklung...

Stellt man Projektbeteiligten die Frage, ob Qualität von Software wichtig ist, antwortet ein Großteil der Befragten vermutlich mit „Ja”. Jede andere Antwort würde sicherlich weitere, unangenehme Fragen aufkommen lassen. Aber was bedeutet Qualität im ...

Testing
Softwareentwicklung

18.10.2023 | 9 Minuten Lesezeit

Kevin Peters

Die Bingo Bongo-Methode: ein spielerischer Software-Testing-Ansatz

Software-Testing kann zur Herausforderung werden. Aber was wäre, wenn es weniger wie Arbeit und mehr wie ein Spiel wäre? Etwas, das das ganze Team einbezieht und sogar Spaß macht? In diesem Beitrag stellen wir Bingo Bongo vor, einen spielerischen Ansatz...

Testing
Agile Methoden
Agilität

31.7.2023 | 4 Minuten Lesezeit

Benjamin Knauer

Test-Fixtures: Wozu denn überhaupt?

Für uns Softwareentwickler ist der ultimative Endgegner immer die Komplexität. Wir haben zahlreiche, teils ziemlich mächtige Waffen gesammelt, um in diesen Kämpfen bestehen zu können: Dinge wie Modularisierung, Abstraktion, Lean Development, iteratives...

Testing
Java
Test Driven Development

12.5.2023 | 19 Minuten Lesezeit

Rüdiger zu Dohna

Microservice Integration Testing done right

In diesem Artikel beschreiben wir gesammelte Best Practices für das Integration Testing von Microservices. Zu diesem Zweck haben wir ein Projekt namens toti-example-service erstellt und auf GitHub veröffentlicht. Wir werden uns in diesem Beitrag immer...

Testing
Microservices
Spring
Kotlin

11.4.2023 | 7 Minuten Lesezeit

Tobias Dittrich

Till Voß

Mule 4: Test-Parametrisierung – ein Flow für viele Fälle

Immer wieder entdecke ich bei Code-Reviews, dass für verschiedene Testfälle, die sich prinzipiell nur durch die Ein- und Ausgabedaten unterscheiden, eine Vielzahl von MUnit-Tests angelegt werden. Diese Flows werden dann mühselig kopiert, um jeden Testfall...

Integration
API
Testing

16.2.2023 | 5 Minuten Lesezeit

Pasquale Brunelli

AWS CloudFront Functions testen

Mit den CloudFront Functions bietet AWS die Möglichkeit, den Funktionsumfang von CloudFront um kleine JavaScript-Funktionen zu erweitern. AWS führt diese Funktionen direkt an den Edge-Locations aus und ermöglicht es dadurch, alle ankommenden Requests...

Cloud
AWS
Testing
Softwareentwicklung

4.10.2022 | 3 Minuten Lesezeit

Dennis

Vom PoC zu Produktionssoftware: Trinke, bactane, programmiere, refaktoriere...

In diesem Text richte ich meinen Blick auf den Übergang vom Proof of Concept (PoC) zu Produktionssoftware. Speziell in kleinen Teams sind die Ressourcen nicht vorhanden, Software umfassend zu refaktorisieren, und der eine oder andere PoC landet in Produktion...

Softwareentwicklung
Testing
Agile Methoden
Test Driven Development

20.7.2022 | 7 Minuten Lesezeit

Robert Meißner

Mock Service Worker – Einfach Backends mocken

Der Mock Service Worker, kurz MSW, ist ein hilfreiches Werkzeug zum API Mocking bei der Entwicklung von Single Page Applications.Beim Entwickeln einer clientseitigen Webanwendung ist die Kommunikation zwischen Frontend und Backend essenziell. Dementsprechend...

Frontend
JavaScript
Testing

29.8.2021 | 9 Minuten Lesezeit

Andreas Houben

Grüne Test-Pyramiden mit Cypress – UI-Testing für die Zukunft

Cypress ist ein junges Open-Source-Test-Framework für Web-basierte, grafische Benutzeroberflächen. Cypress-Tests werden in JavaScript geschrieben und orientieren sich, wie auch bei Selenium-basierten Technologien üblich, am Document Object Model (DOM...

Frontend
JavaScript
Testing

29.9.2020 | 7 Minuten Lesezeit

Jonas Verhoelen

Mule Test Recorder: MUnit-Tests wie von Zauberhand in Mule 4

Vor Kurzem wurde der Mule Test Recorder in Mule 4 vorgestellt. Dieser verspricht eine Zeitersparnis bei der Erstellung von MUnit-Tests. Dafür muss lediglich die jeweilige Applikation gestartet werden. Während der laufenden Anwendung werden dann sämtliche...

API
Integration
Testing

18.6.2020 | 5 Minuten Lesezeit

Pasquale Brunelli

Stefan Koch

Schnelle Frontend-Entwicklung durch typisierte Mock-Server mit json-server...

Bei der Entwicklung von Software kann es vorkommen, dass die Weiterarbeit an einem Feature durch projektinterne Abhängigkeiten aufgehalten wird. Ein Beispiel hierfür ist die getrennte Entwicklung von Frontend und Backend. Oft kann gewisse Funktionalit...

JavaScript
Frontend
Testing

31.3.2020 | 4 Minuten Lesezeit

Felix Magnus

GoMock vs. Testify: Mocking frameworks for Go

Summary: Testify/mock and mockery are the tools of choice, providing an overall better user experience, if you do not need the additional power of the GoMock expectation API.Testify has the better mock generator and error messages while GoMock has the...

Go
GitHub
Testing

22.7.2019 | 15 Minuten Lesezeit

Sergey Grebenshchikov

BDD und End-to-End-Tests – Cypress.io mit Cucumber verbinden

Cypress.io (oder kurz Cypress) bekommt momentan sehr viel Aufmerksamkeit, wenn es um das Thema End-to-End-Testing geht. Speziell im JavaScript-Umfeld scheint sich Cypress.io langsam durchzusetzen. Es macht vieles richtig und ist Selenium-basierten Ans...

JavaScript
BDD
APM
Testing

16.4.2019 | 10 Minuten Lesezeit

Holger Grosse-Plankermann

Testen in Mule mit Datenbanken – Teil 3: Datenbanken mit Docker

In den ersten zwei Teilen der Artikelserie haben wir einen einfachen REST-Service in Mule implementiert, der seine Informationen aus einer Datenbank bezieht. Zum Testen haben wir zunächst die Datenbank gemockt und in einem zweiten Schritt eine In-Memory...

Datenbank
Container
Integration
Testing

15.4.2019 | 4 Minuten Lesezeit

Christian Langmann

Testen in Mule mit Datenbanken – Teil 2: In-Memory-Datenbanken

Nachdem ich im ersten Teil der Artikelserie das Mocken einer Datenbank im Rahmen von Munit-Tests beschrieben habe, werde ich im Folgenden zeigen, wie eine In-Memory-Datenbank zum Testen benutzt werden kann.Warum sollten überhaupt In-Memory-Datenbanken...

Agilität
Datenbank
Integration
Testing

8.4.2019 | 4 Minuten Lesezeit

Christian Langmann

Testen in Mule mit Datenbanken – Teil 1: Mocking von Datenbanken

Mule bietet mit MUnit ein Framework, mit dem sehr ähnlich zu den normalen Flows Tests geschrieben werden können. Ob es sich dabei um Unit- oder Integrationstests handelt, hängt von der Implementierung und der Benennung ab. Denn mithilfe von Maven lassen...

Datenbank
Integration
Testing

1.4.2019 | 6 Minuten Lesezeit

Christian Langmann

BDD für Alexa Skills – Teil 5: cucumber.js Tests und State-Handling

Dies ist der fünfte Teil einer Serie von Blogposts über Behaviour Driven Development (BDD) eines Alexa Skills. In diesem Beitrag erweitern wir unser Testframework um die Behandlung von Status-Informationen.Was bisher geschahTeil 1: Initiales Setup Teil...

AWS
BDD
Testing
JavaScript
Voice UI

11.3.2019 | 9 Minuten Lesezeit

Stefan Spittank

Abweichungen zwischen Spezifikation und REST-API mit hikaku erkennen

Wenn man eine REST-API mit dem Contract-first-Ansatz erstellt, verwendet man vermutlich Codegenerierung oder einen anderen Weg, um sicherzustellen, dass die Spezifikation und die Implementierung im Laufe der Zeit inhaltlich gleich bleiben. In diesem ...

Microservices
Open Source
Testing

8.3.2019 | 3 Minuten Lesezeit

Jannes Heinrich

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Mutation Testing: Watching the Watchmen

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Elegant delegates in Kotlin

Anti-patterns part 2: Coding is the biggest Golden Hammer of all

When anti-patterns become a pattern

The most useless knowledge of all

Not everything that is vital is also your core business

In defence of pedantic tools

Mocks or the real thing? Tips for better unit testing

Essentialism for developers

The vicious circle of bad test code and how to break it

CRUD operations on Spring REST resources with Kotlin

Integration testing strategies for Spring Boot microservices part 2

Integration testing strategies for Spring Boot microservices

Web frameworks and how to survive them

Kotlin’s killer features

Caching de luxe with Spring and Guava

Sensible mutation testing: don’t go on a killing spree

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Test Fixtures mit JUnit 5

Datenbanken testen mit Testcontainers in Mule4

Goldene Wasserhähne – Wie wichtig ist Qualität in der Softwareentwicklung...

Die Bingo Bongo-Methode: ein spielerischer Software-Testing-Ansatz

Test-Fixtures: Wozu denn überhaupt?

Microservice Integration Testing done right

Mule 4: Test-Parametrisierung – ein Flow für viele Fälle

AWS CloudFront Functions testen

Vom PoC zu Produktionssoftware: Trinke, bactane, programmiere, refaktoriere...

Mock Service Worker – Einfach Backends mocken

Grüne Test-Pyramiden mit Cypress – UI-Testing für die Zukunft

Mule Test Recorder: MUnit-Tests wie von Zauberhand in Mule 4

Schnelle Frontend-Entwicklung durch typisierte Mock-Server mit json-server...

GoMock vs. Testify: Mocking frameworks for Go

BDD und End-to-End-Tests – Cypress.io mit Cucumber verbinden

Testen in Mule mit Datenbanken – Teil 3: Datenbanken mit Docker

Testen in Mule mit Datenbanken – Teil 2: In-Memory-Datenbanken

Testen in Mule mit Datenbanken – Teil 1: Mocking von Datenbanken

BDD für Alexa Skills – Teil 5: cucumber.js Tests und State-Handling

Abweichungen zwischen Spezifikation und REST-API mit hikaku erkennen

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten