CB3: Test

This article is part of the CodeBoosting series, where we teach to scientist how to make their code better and shareable.


Any developer of scientific software has found itself in the horrible situation of having a bug in the code and have no idea on how to fix it. Situation that is always made worse by the lack of time. In this article we will show you how to avoid such situation, but since errors and bugs happens anyway we will also provide a simple, structured approach, to explore the problem and find eventually find the bug.

Using test is fundamental to quickly and efficiently resolve issues in your code base. Several approach are possible and each fit a different use case.

In this blog post we will start by introducing what is meant by testing the code, then we will explore two different kind of test and finally different way to endure that your code works.

We will focus on good methodologies for scientist while covering quickly the good methodologies for engineers.

The suggestion from this post are easy to apply if you have followed the two previous post, so make sure to write “good” functions and to respect the Single Responsibility Principle.

What is a test

We consider a test every automatic procedure that verify the correctness of a computer program.

The important part is “automatic”. If you run your software and manually verify that the output is correct, this is a manual test and we are not talking about those now.

While manual tests can be very important they are carried out by human, that eventually makes error, check the results to quickly or not check everything every time. The value is automatic test is specifically that they are carried out by computer that don’t make the same errors of humans.

Integration Test

Integration tests are the slowest and more difficult type of test to write correctly. For small software or analysis are often too much since they may potentially use a lot of time for running and are not trivial to write correctly.

The goal of an integration test is to verify that the whole pipeline of the program works. Let’s make an example.

Suppose you want to test a simple program that read a text file and count the occurrence of each word in the test writing the result in a sorted output text file as csv.

So if in input we pass a file that contains the text: “aaa bbb ccc aaa bbb aaa” the output file would be:


A possible integration test for this program would be to create a file and fill it with data such that we know the occurrence of each word in advance (after all we have created the file itself.) Then we set up a script to run our program against the know data set, and to check the result. If the result is the one that we are expecting then the test pass, if the result is different then the test did not pass and our program have some problem.

Is possible that you find problem like the computation of the average is wrong, but these kind of tests are very useful to discover problems of integration between services. Maybe you were expecting to read a csv while the input file has not commas. Or maybe you write the average as a integer but it should be a float.

These kind of tests are not so useful in scientific code, but if your software is more complex, talk with different databases and communicate with different services, this kind of test are very useful.

A possible application of these test are in big teams or where there is a lot of data preparation necessary. Following the SRP you can write integration test to make sure that your software correctly accept all the possible inputs and correctly generate all the kind of output.

Unit Test

Unit test are the most useful in a scientific environment. They should be very quick and simple to set up, but at the same time they should provide a great way to make sure that your software is correct. They are simpler to write and fast to execute.

The goal of an unit test is to make sure that an “unit” of your software works. This is under the assumption that if all the unit of the software are correct, the whole software is correct, which is a big assumption to make, but nonetheless, unit tests are extremely useful, and powerful.

Let’s go back to our simple program that count the occurrence of each word. This program will most likely have a function split_string_into_words that given some text in input as a string it returns a list of the words in it. A possible unit test would be to test this property with a function like this:

class CountWordTest(unittest.TestCase):
def test_tokenizer(self):
words = split_string_into_words("aaa bbb ccc")
self.assertEqual(words, ["aaa", "bbb", "ccc"])

Or again it can have a test that given a list of words it returns a dictionary with the occurrence of each word.

def test_count_word_in_list(self):
word_in_list = count_word_in_list(["aaa", "bbb", "ccc", "aaa", "aaa"])
self.assertEqual(word_in_list, {"aaa": 3, "bbb": 1, "ccc": 1})

I believe unit test are the sweet spot for scientific code, they are a great tool to debug your code and make sure that it stay bug free, even in large collaborations.

Now that we know the two main class of test, let’s focus on unit test that are more useful in scientific environment and let’s see on how they can use. We will start exploring the Test Driven Development (TDD) which I don’t find extremely useful in scientific applications. Then we will see how to use test to easily debug our code and make sure it stays bug free.

Test Driven Development (TDD)

When developing software using TDD we first start writing the test for our function without even a real implementation of the function itself.

We run the test to make sure that it break, after all we haven’t yet implemented the function, so if the test success there is something wrong.

Only when we are sure that the test fails we start implementing the function in the simplest possible way to make the test pass. When finally the test pass, we keep writing other test adding assumptions about the behaviour of our functions.

This process produce an huge number of test that covers all the assumptions in your code. This is very useful when you need to change your code. After each change you can re-run all the test to make sure that you didn’t break anything.

However it does requires a lot of work that is not always necessary.

TDD is extremely useful when developing libraries that other people will use. All the test will make sure that the library is stable, moreover those test can serve also as a poor documentation.

TDD helps making sure that your library is always bug free, indeed you should commit your code only when all the test pass.

Debugging with test

Another approach that I find more useful when developing scientific software is to use test for debugging.

If you have written good functions and followed the SRP this approach will come naturally and it will be very easy to implement.

Until you haven’t found any problem in your code you can ignore test, however as soon as you need to debug, instead of manually trying different input, you write test to verify your assumption.

For example, it turns out that our function to count words in a string is buggy, indeed with the input aaa bbb aaa,ccc it returns:


At this point is worth to write a test, the function split_string_into_words returns the correct output when the input is aaa bbb aaa,ccc?

class CountWordTest(unittest.TestCase):
def test_tokenizer_with_comma(self):
words = split_string_into_words("aaa bbb aaa,ccc")
self.assertEqual(words, ["aaa", "bbb", "aaa", "ccc"])

############ RUN TEST ##############

FAIL: test_tokenizer (main.CountWordTest)
Traceback (most recent call last):
File "", line 4, in test_tokenizer
self.assertEqual(split_string_into_words("aaa bbb aaa,ccc"), ["aaa", "bbb", "aaa", "ccc"])
AssertionError: Lists differ: ['aaa', 'bbb', 'aaa,ccc'] != ['aaa', 'bbb', 'aaa', 'ccc']
First differing element 2:
Second list contains 1 additional elements.
First extra element 3:
['aaa', 'bbb', 'aaa,ccc']
['aaa', 'bbb', 'aaa', 'ccc']
? + ++

Great, we have found our problem, we can change our function, re-run the test to make sure that everything works, and then try again to run the whole program.

def split_string_into_words(s):
return s.replace(',', ' ').split(" ")

############ RUN TEST ##############

Ran 1 test in 0.000s

Now, we have fixed our code and we added a new test. The new test make sure for us that, even if we change the implementation of split_string_into_words the same bug won’t represent itself.

If the code was more complex the use of test provides a structured way to find a problem. Start write the test for the big functions, make sure it fails, and then start to test each smaller function that is called until you don’t find the one that fails. At this point, either fix the function, or if it is still too complex, write more test until you don’t find the bug. Lets see an example, where the function foo() is broken.

def foo():
a = make_a()
b = get_b_from_a(a)
c = generate_c(b)

At this point, you can test the function make_a if the test for make_a success, the you can move and test get_b_from_a, and this time the test fails. Great, the we can look into it and write the necessary tests.

def get_b_from_a(a):
h = create_h(a)
return buggy_find_b_in_h(h)

At this point you can write test for the create_h functions and make sure it works as expected and, if the test pass, you can move on and test the suspicious buggy_find_b_in_h.

Debugging with test is the sweet post for scientific software. It allow to write code very quickly while at the same time it provide a structured approach to debugging and to avoid the same bugs coming back.


This article build on top of the other two, if you have written good function and respected the Single Responsibility Principle, write and debugging with test will come naturally and it will help you a lot in find bugs and make sure they don’t come back.

Other approach to testing are possible, if you would like to hear more simply comment of this post!


We publish new content each week, subscribe to don't miss any article.