Skip to content

Evolving, frozen, and reproducible environments

Keywords: Container image

Computational environments are assembled from software, and software evolves over time. This is one of the core issues in computational reproducibility. There are three common attitudes to dealing with the evolution of software in computational environments. Each of them is appropriate in some contexts.

An evolving environment is regularly updated by updating its components. That is how most people manage the default work environment on their computers. Regular software updates ensure bug fixes, in particular for security-relevant issues, and functionality updates that are often welcome.

A frozen environment is a snapshot of an environment that can be activated identically at any point in time. It is often stored in a file, in the form of a container image or a virtual machine image, for distribution and archiving. Frozen environments are often used in collaborations, to ensure that everyone uses the same software. They are also often shared publicly, to facilitate the deployment of some piece of software.

A reproducible environment can be reconstructed identically from the source code of its components, at any time and also on a different (but sufficiently similar) computer.

A reproducible environment is necessarily also a frozen environment. Evolving environments could be reproduced in theory, by restoring the initial state and then applying all the updates again, one by one. In practice, this never works because neither the initial state nor the update steps are recorded in sufficient detail.

A frequent confusion is to consider a frozen environment, such as a container image, to be reproducible. A frozen environment can of course be re-used identically at any time, on the same or on a different computer. But if it cannot be reconstructed from the source code of its components, there is no way to be certain what exactly these components are. The creator of the image may well provide a list of the components with version numbers and references to the source code, but there is no way to check that this list is accurate and complete. In fact, experience has shown that manually written lists of components are rarely accurate, and never complete. Lists made with the help of a package manager are not guaranteed to be accurate and complete either: version numbers can be missing or imprecise, and low-level tools required for constructing the environment may be missing. This is a common issue e.g. with the package manager conda, whose environment specifications can rarely be reconstructed more than a few months later because low-level build tools are absent.

Another limitation of frozen but not reproducible environments is that they cannot be modified in a controlled way. Consider the situation that you want to change just one component, in order to test if a different version leads to a different result. You can perform this test only if you can be sure that nothing else has changed in your modified environment. And that is possible only if you have a reproducible recipe for reconstructing your environment, in which you can change a single version reference for performing your experiment.

The best practice for reproducible research is to perform research computations only in frozen, and ideally in reproducible environments. Projects that rely on frequent updates for some software components should manage the evolution of their environment as a time series of reproducible environments, and for each computed result keep a reference to the environment that was used.