Containers for reproducibility¶
Keywords: Container
Containers, or more precisely container images, are among the most frequently cited tools in the context of reproducibility.
Container images are intended for deploying software, and that's what container runtime engines such as Docker are designed and used for outside of research. A container image wraps a complete software environment into a single file. If two people run a piece of software from the same container image, they should (and normally do) obtain the same results. That's good for reproduciblity.
There are a few caveats though. The most important one is that many container runtimes are designed for cloud computing, not for deploying software on a personal computer on a user account as in a typical research setting. You can do it, but it requires a couple of command-line options that link the container to files on your computer. This entails the risk that the computed results depend on your local files in a way you didn't intend to. That's why you still need to verify reproducibility on a different machine when you work with container images.
A more subtle but also more important issue with container images is that they provide only a very shallow version of reproducibility, which in research practice is usually not sufficient.
Suppose you see a figure in some paper, with the promise that you can reproduce it by running a given command line inside a given container. What have you actually gained from this promise (and the image that goes with it)?
Given just the figure, you already knew that there exists a program that produces this figure. With the container image, that program is in your hands. You can re-run it. But you cannot inspect it, nor modify it to explore variants of the paper's research question. The paper may claim that it was derived from some cited source code, but you cannot verify that either. A container image is the proverbial black box.
Re-running a computation identically is usually just the first step in a research project building on published results. If the re-run is successful, you will want to vary parameters an algorithms to do slightly different things. You will also want to inspect the code, to understand the implemented methods in more detail.
That's why container images are helpful for reproducibility, but not sufficient. Your peers must be able to reconstruct the container image from source code, including the possibilty of modifying the source code in the process. Ideally, they should be able to reproduce it identically, reproducibly. Only then can they be sure that you actually used the source code that you claimed you were using.
It is possible to write recipes for the reproducible construction of container images, but it is not easy. You have to watch out for every single ingredient, making sure that it is either source code or built reproducibly from source code. Most published recipes are not reproducible. The MOOC "Reproducible Research II: Practices and tools for managing computations and data" demonstrates two possible appraches for reproducible container construction, one based on Debian snapshots and one based on Guix.