Skip to content

Package managers

Package managers are tools for constructing software environments from building blocks called packages.

There are many package managers, focussing on different aspects of software development and deployment. We do not attempt to provide an exhaustive list. The following overview cites some package managers as representative examples of categories.

Deployment vs. development

Some package managers focus on constructing software environments for deployment. Such environments contain software written in many programming languages, but also documentation and possibly datasets. Package definitions are written by packagers, who are typically distinct from the authors of the software being packaged. Examples are apt, conda, and Guix. Their main strength is the possibility to construct complex yet robust software assemblies. Their main weakness is the effort required to write and test package definitions.

Other package managers focus on constructing sub-environments for software development in a single programming language. Package definitions are typically written by the software authors themselves. Examples are PIP (for Python), npm for JavaScript, or cargo for Rust. Their main strength is the management of evolving dependencies. Their main weaknesses are the limitation to a single language and the (intentional) lack of precise version references.

In computational science, development-focused package managers are best suited for sharing code within a team of collaborators, whereas deployment-focused package managers are required for ensuring long-term reproducibility by a wider community. As a helpful analogy, consider a development package as the equivalent of a lab notebook, and a deployment package as the equivalent of a published paper.

General vs. specialized environment

Some package managers, in particular the earliest ones, are made for constructing entire operating system installations, or significant subsets of system installations that group together many unrelated packages that together make up the default working environment of a computer or of a single user account. Examples are apt or homebrew.

Other package managers focus on constructing specialized environments for a single task or project. Examples are conda and Spack, but also all development-oriented package managers.

Some recent package mangers, for example Guix, address both use cases.

For reproducibility, specialized environments are the most appropriate choice. They permit others to re-run and re-use published work without modifying their personal work environments that have often been fine-tuned over many years.

Version vs. content-based references

Some package managers identify their software packages by arbitrarily assigned labels, usually consisting of a name and a version number. There is no general agreement on how version numbers should be interpreted, making version-based references inherently fragile.

A small number of recent package managers, mainly Nix and Guix, use references derived from the source code itself, a technique known as content addressing. For these package managers, changing a single letter in the source code makes it traceably different.

For reproducibility in complex software assemblies, content-based references are essential for accurate and automatable bookkeeping. The use of imprecise version-based references is one of the main reasons for computational irreproducibility.

Single-level vs. recursive dependency tracking

This is a very technical distinction, but also a very important one for reproducibility.

Package managers must build a package from software source code. This step includes in particular compiling source code, but also related tasks such as preprocessing documentation and copying files into a suitable file system structure. Software builds happen in a computational environment of their own, and that environment is also assembled from packages. With single-level dependency tracking, the build environment is described as a list of requirements, for example "a C compiler". With recursive dependency tracking, the full construction of the build environment is recorded with precise references. Recursive dependency tracking requires a package manager that can handle specialized environments. In practice, it also requires content-based references for manageable bookkeeping.

That lack of recursive dependency tracking is the most important cause of computational irreproducibility. Unfortunately, recursive dependency tracking is a new and not widely implemented technique, so far available only with Nix and Guix.