Letting Users Install Stuff

2024/07/20

There is a clear distinction between users and administrators on computer systems with multiple users. For example, people might not want everyone to be able to see their files or be able to kill the programs they are currently working with. The operating system kernel, of course, has to have the capability of doing all these things (since it is more powerful than any of the user processes it is responsible for). Meanwhile, administrators are supposed to be able to do anything that the kernel has powers for; we can't really call them administrators if they can't.

A computer system, of course, consists of parts that are shared between the users. This includes dynamic libraries and programs that all of them use. Users, after all, are just users; they use the programs made available to them by the admins.

If, on the other hand, the users are slightly more sophisticated, they can just compile their own programs from source code or copy binaries created for this specific system. This is pretty similar to what admins are doing anyway: they're downloading binaries by hand or compiling them from source, making them available for everyone.

Assuming, of course, that this is a big UNIX system and we are in the 80s.

If this is not a big UNIX system and / or we are not in the 80s, this assumption results in various kinds of stupid consequences instead.

Package managers

Linux systems in the 90s came up with a nice invention: package managers. Instead of admins having to install software from random tarballs one by one, you could just issue a command to install them from a central repository, also taking care of all the dependencies they might need.

This made fairly complex trees of dependencies possible. Part of a distribution's job is to make it sure that binaries they provide are using a consistent list of library dependencies. Given the sad state of binary compatibility on Unix or Linux systems, this is necessary: if you want to ship a binary, you need to compile it with very specific versions the distributions provide. That's why, for example, pre-compiled binaries from a third party look like this:

the various Linux distributions NVIDIA CUDA toolkit is available for.

There is a reason why Gentoo Linux is not in that list: it's impossible to ship a binary that might work with any combinations of dependencies its flexibility allows.

Given the complexities of all this, you clearly wouldn't want to let a plain user of the system upgrade a system library; things might break pretty easily.

DLL Hell

An example for what happens if you let users do this is what happened a lot on Windows 9x in the 90s. Lacking a package manager, installers typically bundled all the libraries they depended on; at the same time though, due to the way COM works, you needed to install these libraries globally on the system. Theoretically, this works if everyone is polite enough not to downgrade newer versions that someone already installed, assuming perfect backwards compatibility. In practice, the result is the terrible chaos of installers replacing parts of other programs with their own versions, causing them to stop working.

Part of the solution was a system called Windows Side-by-Side (WinSxS), allowing the installation of multiple versions of the same library at once, letting programs request their own specific one. (If you ever wondered why the Windows system directory of the same name is so large: this is why.)

Overall though, this is still a hard problem, with the potential of risking the entire system if you overwrite or add the wrong thing. We clearly need administrators to deal with this, right?

Package managers (the other kind)

We're back on Linux. We installed our distribution, which consists of a set of packages carefully put together by wise and responsible maintainers. Depending on which distribution we use, these might or might not be especially up-to-date versions; for example, Debian Stable is famous for being conservative with upgrades.

Let's say you're writing a program in Python, rendering spherical trees on the screen. Your code uses the latest version of a cool leaf drawing library, just pushed to git by its developer.

How do you add this library to your system? Do you wait for Debian stable to pick it up three years later? Do you just download the Python files into a random directory? Maybe we could create another package manager instead, that users can use to create development environments!

So this is how pip happened. (Along with the other, excessively many Python packaging solutions.) Also, npm. Also, cargo. Also, Maven / Ivy / Gradle / etc. And Go Modules. And... (I think the general idea did come across at this point.)

People use these because they are not operating system specific; you also don't need to wait for package maintainers to pick up your package, then the admin to globally install the one and only version on your system. (This is nice even if you do happen to be the admin.)

How you ship the results from these is a separate question. With some programming languages, you might be able to just statically link a binary that doesn't depend on a lot of external dependencies. But then... what you build might still be dependent on your exact distribution. At least parts of it. It also depends on libraries it doesn't have (since you took them from another package manager).

How do you even ship this thing?

At least it does run on your computer. Now, wouldn't it be convenient if everyone else had your computer too?

Containers

Docker, in fact, is a nice solution for this problem. You just build a container using the base image of a specific Linux distribution. You add your programming language specific libraries, along with your own code. Then you ship the end result. Because it doesn't depend on anything from the host distribution, it will work anywhere.

Well, except, specifically, in the above paragraph, by "nice solution" we mostly mean "epicly terrible solution but at least it works and we couldn't really think of anything else".

Part of it is about Docker specifically. It does reinvent a decent number of mechanisms that UNIX operating systems already have, except in worse ways. (If you're looking for a nice write-up on why Docker specifically is terrible, "Docker Considered Harmful" is a nice one.) But then also... implementing containers in better ways is still not an especially good solution for the actual problem of shipping the right set of dependencies.

To begin with, dealing with one computer is a lot simpler than dealing with multiple computers. Managing containers is a step towards turning your computer into multiple ones: you have a different operating system on each one, with a separate set of libraries, a separate list of processes, a different view of the file system. Although Docker has some solutions for not literally replicating every single byte of the underlying operating system in the containers, it is still not smart enough to figure out how we just installed the same packages on each of them, except in a different order. This is not surprising: we were attacking the problem at the wrong level of abstraction.

This level of isolation is sometimes justified. Is it justified just to deal with library dependencies? Not really.

The root cause

The reason why all of this is fairly broken is relatively simple: we keep not letting users install packages.

Seemingly, we don't let them do this since this might affect other users or programs in negative ways. Oddly enough though, we have figured out how to solve this problem about running programs themselves. We have process isolation and user accounts. As such, most operating systems do allow users to download, compile, and run arbitrary binaries as ordinary users. The difference is that we do not have access to the well-organized system-side package manager while doing this; all we can do is throwing binaries into our home directories and hope that they work.

Pretty much all of the solutions mentioned above are workarounds for this problem. We create programming language specific package managers because we have more control over what we install as regular users. We statically link binaries so that we have control over what libraries get loaded at runtime, instead of trusting the system with this. Also, we ship our entire operating system as a container so that we don't have to interact with anything else but the OS kernel.

This is not just a problem with Linux, it's relevant for Windows too. Until recently, you couldn't install COM DLLs as a regular user, since you had to register them in the system-wide registry. Lately, you can override these with per-user ones; not a lot of installers are using this feature though. Meanwhile, even with ones that do, registering them results in user-wise installs, which might still have too broad of an impact. As a consequence, not even touching the registry counts as a feature (despite the registry being a pretty nice way of storing configuration info).

So, to reach the ambitious goal of being able to let users install packages, we clearly need to add more separation mechanisms to the operating system itself. Users and programs should be able to specify what they would like to see. This is similar to how, with virtual memory, different processes have a different view of the physical memory in the machine, each thinking they have their own separate computer.

Is it even possible to get this right though?

Nix & Guix

They are two Linux distributions that I think are on the right track. Their main innovation is that instead of having a single set of packages that are available for everyone to use, you specifically declare which versions of everything you want. For example, if I, as a regular user, need version 3 of our favorite leaf drawing library, I can just ask the operating system to make it available for me. If another user wants version 2, they will see that one instead, while, in fact, both of them are available on the system.

The experience from the user side is somewhat similar to that of a container: you can have an environment that you fully control as a regular user. However, unlike containers, the OS is fully aware of what packages are available. There is a lot less redundancy, and you do not need to ship around multiple GB images or reinstall packages repeatedly.

Is this perfect? Not really. Accomplishing these goals on an operating system with a basic design from the 70s and no good culture of binary compatibility is fairly hard. They also add a lot of moving parts, Nix with an entire programming language and Guix with Scheme, that can easily go wrong. (I did try the latter a year or two ago; things definitely take longer than installing Debian, for questionably justified reasons.)

They do exist and do work though!

Is the solution simple?

We can likely come up with something that's nicer than containers.

Versioning of dependencies is still a hard problem though. In an ideal world, there is exactly one version of everything. In practice, though... Do we pick the library built into the system? (That way we are more compatible with the rest of the system, but they might break us unexpectedly.) Do we require our own version? (Which will come with fewer surprises around what it does, but might not be able to talk to the rest of the world as well. It also doesn't come with security upgrades.) Can we even use the system version, if we have to recompile each time it changes? And how about the requirements of the libraries we are using? What if they disagree on the accepted versions of the library they both need?

In the end, there is no good way of putting together terribly written, rigid pieces of code and expect it to work.

It would be an improvement though to have a nice OS support us while trying to get things done, while not having to work with four different packaging systems while trying to do so.