Testing Scientific Code - How to Get Away with Bad Code

August 10, 2019

Scientific code is bad. Students and researchers use programming as a tool to bridge the gap between mathematical theory and predictive model. As a result, scientific code is frequently written by programming beginners and therefore oftentimes of bad quality.

If you ask me, there is nothing wrong with writing bad code. Writing bad code always seemed to be a part of evolving towards better programming. However, I can tell from personal experience that badly written code tends to break. Break a lot and unexpectedly. This is annoying if you have a deadline for your thesis or paper.

In this article, I will introduce test automation to you. It helps when trying to get away with badly written code. If you haven't used test automation in one of your projects so far, this article might be valuable to you. However, if you are already using test automation in your projects, it is likely that the information in this article will not be new to you.

I will focus on examples in the Python programming language. It's the language I know best and many people write scientific code with it.

Why do I need test automation?

Let's start with an example. Imagine you write scientific code in Python. You define functions and call them in the main function to have an executable script called transform.py.

    def main():
        print(second_transform("path/to/data"))

    def load_data(data_path):
        # some code to load data
        return data

    def first_transform(data_path):
        data = load_data(data_path)
        # some code to transform data
        return transformed_data

    def second_transform(data_path):
        data = first_transform(data_path)
        # some code to transform data again
        return twice_transformed_data

    if __name__ == '__main__':
        main()

This coding-pattern is quite natural when tinkering around with scientific code. One defines a function that is called by another function. Then one gets a new idea and adds another function that builds on what was previously defined and so on.

Image you run python3 transform.py in your terminal. You check the result, it seems plausible and you go on with your work. Two weeks later, you discuss the result with a colleague who comes up with a new transformation. You go ahead and write a new function third_transform.

    def main():
        print(third_transform("path/to/data"))

    def load_data(data_path):
        # (...)

    def third_transform(data_path):
        data = second_transform(data_path)
        # some code to transform data a third time
        return thrice_transformed_data

You run python3 transform.py. The code runs without producing an error and the result displayed on your screen is utter nonsense.

This is why you need test automation.

Debugging the hard way

There is a bug in the code. Maybe the newly introduced function third_transform is at the root of the problem. But are the returned objects from first_transform and second_transform actually what you thought they were? Maybe the loaded data from load_data is already broken.

You clench your teeth and start with load_data, load a working minimal example and print the output. Loading, printing, checking results. Debugging takes forever :(.

Debugging the easy way

Test Automation frees you of the burden of manually loading minimal examples, printing and evaluating them. You just have to come up with such a minimal example once. Then you tell the machine what to check for in the result and you're done. What you end up doing is run python3 test_transform.py and check the report for errors.

In case a test fails, you are shown what function in which test failed. When debugging, you're already set up with a minimal example - the test you wrote.

How to write Tests in Python

Following its "batteries included" philosophy, the Python Standard Library has contained a module called unittest since Python Version 2.1. This means that you can directly go ahead and write a test, and that without installing any additional software. Just create a file called test_transform.py. in the same directory where transform.py is located. Then, load unittest and write your first test.

    import unittest
    import transform

    class TestTransform(unittest.TestCase):

        def setUp(self):
            """ Set up attributes used for testing """
            self.path_to_testdata = 'home/user/project/testdata.csv'

        def test_load_data(self):
            data = transform.load_data(self.path_to_testdata)
            self.assertIsNotNone(data)

        def test_first_transform(self):
            (...)

The scheme is straightforward: Define a testcase by subclassing unittest.TestCase and define individual tests as methods whose names start with the letters test. Have a look at this example in the official documentation for more information about the usage of unittest. If you now execute python3 test_transform.py within your terminal, the tests will be automatically executed.

Unit Tests vs. Integration Tests

One can differentiate between what is called a unit test and an integration test. A unit test tests small atomic components of a program. One would create such a test for each function or class. In our example, test_first_transform, first_second_transform and test_third_transform would have been created.

In contrast to this, an integration test runs a whole chunk of your program and checks whether it performs as expected. Its scope is to check integration between different components. In this case, one would only create a test for third_transform, knowing that such a test calls all other functions of the program as well. If the result is not as expected, you can be sure that the program is broken.

Although it is nice to know that there is a difference in tests one can write, the distinction between unit tests and integration tests is supposedly important for larger projects. I learned about it after I started writing tests and never felt any difference in application.

Final thoughts

Test automation is definitely a practical tool that helps you spend less time debugging. Testing is sensible when working with a complex codebase. However, you won't need any tests when working on small projects.

Besides its use in debugging, testing will make your code easier to maintain. Someone who has never seen your code before now has a tool to check whether new modifications break the program. This is especially useful for someone who repeats your scientific experiment or wants to continue building on it.

Eventually, I want to recommend some heuristics I found useful when writing tests:

Write a test while creating new components. I often demo what I want my components to do in a python shell next to my text editor. When sketching this way, tests are oftentimes mere byproducts of this playing around with possible implementations.

Write tests when debugging. Testing doesn't prevent bugs from happening. Rather, it just helps with identifying some bugs when they occur. Although you will certainly spend time debugging, why not leverage the debugging you do to write even more tests? And once you've found the bug, write a test to prevent it from ever happening again!

By Philipp Jung, data engineer and machine learning researcher.