Improving test quality with mutation testing

Do you want our logo?

Do you want our logo description

Quis custodiet ipsos custodes? This is a Latin phrase from the Roman poet Juvenal, which means Who will guard the guards themselves? It's a good question. Is the law truly equal for everyone? Do the guards apply the law correctly?

In software development, the guards are the tests. Yes, tests must ensure that our code meets business requirements, meaning our code does what it is supposed to do. Tests are also the guardians who must watch that we don't “break” anything when we make a modification; they must warn us.

But, how can we ensure that we have enough tests? Yes, we can measure code coverage, better known by its English name Code Coverage:
In software engineering, code coverage, also called test coverage, is a percentage measure that indicates the degree to which the source code of a program is executed when executing a given set of tests.

As can be seen from the definition, code coverage only measures the percentage of lines that have been executed, not the quality of the tests. I have seen many projects with almost 100% code coverage, but most of the tests were irrelevant or not very useful.

Quantity and quality are different things. But how can we detect the quality of our tests? And who controls whether our tests (the guards) perform their task well? Mutation Testing helps us with this task.

What is mutation testing?

The idea of mutation testing is to simply modify the code covered by tests, checking if the existing test suite for this code will detect and reject the modifications. It is used to design new tests and evaluate the quality of existing tests. What are the tests with poor quality?

Tests without assert that do not verify any result.
Lack of clarity about what is being tested.
Dependency between tests, which can generate cascading failures.
Testing more than one thing in the same test.
Skipping tests for hard-to-test code, leaving potential errors undetected.
Unstable tests that fail randomly without a clear cause.

Underlying assumptions

Mutation testing is based on two ideas:

The first is the competent programmer hypothesis: they do their job in the best possible way, but we can make small errors that are not in the program's structure.
The second is the coupling effect: small errors tend to propagate and generate more complex failures in the code, so by detecting these small errors with mutation tests, more serious defects can also be discovered.

Basic concepts

Mutation operators/mutators

A mutator is the operation applied to the original code. Basic examples include changing a '>' operator to a '<', substituting '&&' operators for '||', and substituting other mathematical operators.

Mutants

A mutant is the result of applying the mutator to an entity. A mutant is a modification of the code at runtime that will be used during the execution of the test suite.

Killed/surviving mutations

When the set of tests is executed against the mutated code, there are two possible outcomes for each mutant: the mutant has died or has survived. A killed mutant means that at least one test has failed as a result of the mutation. A mutant that has survived means that our test suite has not detected the mutation and, therefore, needs to be improved.

How does mutation testing work?

Since mutation testing validates the quality of our tests, before launching the mutation tests, the tests are executed and must pass. Otherwise, you cannot proceed.

If all tests pass correctly, the mutation testing library begins to create mutations. For each mutation, all tests are executed.

Let's look at an example of 10 tests for which 5 mutants can be created.

All tests are executed. If all pass, proceed to step 2.
The first mutant is created, and all tests are executed again. As we see in the following image, the third test fails. This means the mutant was detected and killed.

table with tests and results. we see that on the third test, the test fails

👍 If our tests fail after the mutation, then we can say that the mutation was detected and killed.

The second mutant is created, and all tests are executed again. This time all tests pass without failing, and the mutant was not detected. Therefore, the mutant survived.

the second mutant is launched and we see that all tests pass. the result comes out as "survived"

The quality of the tests is measured based on the percentage of killed mutations. Mutation tests check if the tests are effective.

This is the list of automated tools for Mutation Testing:

Pitest for Java.
Stryker Mutator for JavaScript, C# and Scala.
MutMut for Python.
Infection for PHP.
Mutant for Ruby.

Mutation testing example with Fizz Buzz Kata

Let's see it with an example of a kata called Fizz Buzz. The requirements of the kata are simple:

Write a Java program that prints a line for each number from 1 to 100.
For multiples of three, print Fizz instead of the number.
For multiples of five, print Buzz instead of the number.
For numbers that are multiples of both three and five, print FizzBuzz instead of the number.

Here is our code for FizzBuzz.java:

public class FizzBuzz {


   public String convert(int number) {
       if (isDivisibleBy(3, number)) {
           return "Fizz";
       }


       if (isDivisibleBy(5, number)) {
           return "Buzz";
       }


       if (isDivisibleBy(15, number)) {
           return "Fizz";
       }


       return String.valueOf(number);
   }


   private boolean isDivisibleBy(int divisor, int number) {
       return number % divisor == 0;
   }
}

And this is our FizzBuzzTest.java:

java
class FizzBuzzTest {


   private FizzBuzz fizzBuzz;


   @BeforeEach
   void setUp() {
       this.fizzBuzz = new FizzBuzz();
   }


   @ParameterizedTest
   @CsvSource({"1,1", "2,2", "4,4"})
   void convert_regular_number_to_string(int input, String expected) {
       String actual = fizzBuzz.convert(input);


       assertThat(actual).isEqualTo(expected);
   }


   @ParameterizedTest
   @ValueSource(ints = {3, 6, 9})
   void convert_numbers_divisible_by_3_and_not_divisible_by_5_to_Fizz(int input) {
       String actual = fizzBuzz.convert(input);


       assertThat(actual).isEqualTo("Fizz");
   }


   @ParameterizedTest
   @ValueSource(ints = {5, 10, 20})
   void convert_numbers_divisible_by_5_and_not_divisible_by_3_to_Buzz(int input) {
       String actual = fizzBuzz.convert(input);


       assertThat(actual).isEqualTo("Buzz");
   }


   @ParameterizedTest
   @ValueSource(ints = {15, 30, 45})
   void convert_numbers_divisible_by_15_to_Fizz(int input) {
       String actual = fizzBuzz.convert(input);


       assertThat(actual).isEqualTo("Fizz");
   }
}

I'm using Maven in this example, but if you want to use it with Gradle, you can follow this tutorial: Gradle quick start.

Reviewing the results

We test if it works by executing the following commands:

mvn clean test 
mvn pitest:mutationCoverage

It's necessary to execute both commands in order, because Pitest works with compiled code. So, if the code and tests aren't compiled, you won't see the result of the last changes we've made. If you haven't made any changes, it's enough to execute the second command. A faster command can be:

mvn clean compile test-compile 
mvn pitest:mutationCoverage

But personally, I like fast feedback, so I prefer to run the tests first and then mutationCoverage. Below, we see the following error:

The mutation score is 90% and must be at least 95%. Let's review the results of the Pitest report. We open the *index.html which can be found in the folder target -> pit-reports.

Screenshot of the index where the pit reports challenge is located

We open it in our favorite browser and review the results:

We open the report until we reach FizzBuzz.java and review the report:

Results of fizzbuzz.java. We see that on line 15 there is a mutant that has survived

We see that on line 15 there is a mutant that has survived. Below we can see the mutations:

List of mutations. On line 15 there is a mutant that has survived

We see that the sixth mutation consists of changing the return value to an empty string "" and the mutant has not been detected by the tests.

Improving our code using the Pitest report results

If we review the code and look at line 15, we see that the condition will never be reached because, if the number is divisible by 15, it is also divisible by 3, so it would be met in the condition on line 7, whose condition is that it is divisible by 3 and returns "Fizz".

We move the check if the number is divisible by 15 as the first instruction of our method convert(int number):

public class FizzBuzz {

   public String convert(int number) {
       if (isDivisibleBy(15, number)) {
           return "Fizz";
       }


       if (isDivisibleBy(3, number)) {
           return "Fizz";
       }

       if (isDivisibleBy(5, number)) {
           return "Buzz";
       }

       return String.valueOf(number);
   }

   private boolean isDivisibleBy(int divisor, int number) {
       return number % divisor == 0;
   }
}

We run the mutation tests again, they pass correctly, and the build finishes. We review the report and see that the mutation score is 100%.

Pit test report results. the report returns a 100% success

Reviewing the changes, we detect that, in our haste, we had made a mistake. For values divisible by three and five, we returned Fizz instead of FizzBuzz. Let's fix it in the code and tests and see that everything works correctly.

Thanks to mutation tests, we can detect small, unintentional errors.

What do we do with surviving mutants?

Analyzing surviving mutants is fundamental for improving code quality. While some reveal significant problems in the code or tests, others represent equivalent mutations or simply noise. However, all provide valuable information about the effectiveness of the tests.

Types of surviving mutants

We can divide surviving mutants into three categories:

Noisy mutants.
Mutants that cannot be killed.
Mutants with valuable information.

Noisy mutants

In this category we can include:

getters & setters
custom hashCode
custom toString
custom equals
autogenerated code, for example by OpenAPI codegen, Lombok, MapStruct…

This code, in most cases, is automatically generated with the IDE or using Lombok, so it doesn't add much value if we perform mutation tests on it. It should be excluded from mutation coverage.

Mutants that cannot be killed

This group of mutants gives us valuable information for refactoring These mutants can show us:

“Dead code” or useless code that is never called.
Code that only affects performance.
Code that only affects internal state.
Logic in another part of the code.

Mutants with valuable information

These are mutants that reveal real data and/or significant problems in our code or tests. We must pay attention to them and solve the problems they show us.

Mutation operators

There are infinite possible changes depending on the size of our code. These are some of the mutators:

Void method mutator

If we have a method that returns nothing, it means it's a method with “side effect”, meaning it changes a global state or something in the infrastructure. For this reason, Pitest removes all code from the method to see if the tests fail:

Null return mutator

changes "return new" to "return null" in the code

Constant mutator

removes if field and replaces it with "return 3"

Optional mutator

The effective implementation of mutation testing

For mutation testing to be possible, our tests must meet the following requirements:

They must have the same result every time.
They must be very fast.
They can be executed in any order.
They can be executed in parallel.

Unit tests meet these requirements, so we should exclude the following tests from mutation testing:

Integration and E2E tests.
Performance tests.
Contract tests.
Any test that changes global state…

Conclusions

Mutation testing is an advanced validation technique that helps ensure code quality by evaluating the effectiveness of existing tests. Its implementation provides several key benefits:

Greater test reliability. By detecting weak or ineffective tests, mutation tests reduce the risk of code failing without being detected by tests, thus strengthening the software's safety net.
Promotion of good development practices. The need to improve tests creates positive pressure to reduce unnecessary code, minimize duplication, and write more effective tests.
Ease of adoption. Tools like Pitest allow for simple installation, execution, and interpretation of results, which facilitates its integration into the development and testing workflow.

In summary, mutation testing is a valuable technique that elevates software quality by strengthening its testing system, fostering more efficient code, and ensuring that errors are detected before reaching production. You can see all the code with the step-by-step commits in the GitHub repository fizzbuzz-mutation-testing.

Ismail Ahmedov

Hands-on Software Architect, lifelong learner, and trainer. I promote best practices in software development, the S.O.L.I.D principles, TDD, DDD, and DevOps culture within teams. I’m always ready to take on new challenges and step out of my comfort zone to learn something new.

View more of Ismail.