This is still the header! Main site

Persistence

2022/03/21
This is post no. 96 for Kev Quirk's #100DaysToOffload challenge. The point is to write many things, not to write good ones. Please adjust quality expectations accordingly :)

In UNIX, everything is a file.

Well, almost. Everything except... for example, TCP ports. But... for example, every process on the system has an associated directory in /proc/[pid]/.

Except... of course, processes don't persist over restarts.

But... what if they did?

Let's make everything persistent!

After all, processes are just a bunch of mapped memory areas and some thread state. Some of the mapped areas are files. Some of them is just memory they allocated.

Of course, part of the point of memory is that you don't have to persist it to disk all the time. But... maybe you could?

In fact, there is CRIU, which is doing just that: saving an actual process so that you can move it to another machine even! (... and then you hope that it has the exact same CPU architecture, otherwise things won't go especially well.)

But then you wouldn't even need to go all the way: e.g. AS/400 probably can't persist processes themselves, but it does blur the line of "data loaded in memory" and "data on disk" with its single-address-space model.

... but why?

As in: sure, it's an interesting theoretical question (... for... some?); what's the point though? Isn't it simpler to just restart processes instead of trying to save them? Also, what if the process crashes anyway? If you don't really have a separate "save" operation, how do you recover from a crash?

Well... I'm not arguing for keeping everything forever. It's just that the model could be a lot less rigid.

For example, Lisps have a workflow where, instead of recompiling and restarting your executable, you have one running process you're working on; you keep changing it bit by bit, without any major restarts. Of course, eventually, you need to figure out how to recreate the state from scratch just by loading a series of source files... but it's beneficial that you don't have to do this every time.

Persistent server processes

Imagine a server machine with services that are being used fairly rarely. The standard solution for this to use a daemon like xinetd, which starts up the right service whenever it's needed.

Alternatively, you can have them started up and swapped out; the result is similar (in that you aren't using a lot of actual memory all the time).

What if we could just make this more explicit and persist the relevant processes to disk once they have been (once!) started?

What if this was part of the service build process? So that you could have a file built, symlinked somehow to e.g. port 80, so that if a request showed up, the OS could just resume that process as if it was already running... since... there wasn't any difference between "already running" and "persisted on disk"?

... comments welcome, either in email or on the (eventual) Mastodon post on Fosstodon.