This is still the header! Main site

Memory mapped FS

2021/09/01
This is post no. 30 for Kev Quirk's #100DaysToOffload challenge. The point is to write many things, not to write good ones. Please adjust quality expectations accordingly :)

We have invented memory-mapping files a while ago. It's a nice application of virtual memory: you tell the OS to map a file, it'll give you a memory address range, and each time you read from that range, it'll read the relevant contents of the file from disk and load it into memory, so that to you it'll look like the file contents instantly appeared in memory.

So, if you memory-map a multi-gigabyte file, this won't immediately result in the OS reading in everything. It'll work like memory you swapped out to disk. In many OSes, it's actually exactly the same mechanism. Even better: in some OSes, including NT derivatives, traditional file IO operations also go through the same memory-mapped pages eventually.

Now, could we take an otherwise good idea to the extreme? "We Must Because We Can".

... let's memory-map everything.

And by that, I mean the entire file system.

This doesn't actually mean memory-mapping every file in it separately. That'd take a very long time... and how do you even give back pointers to where they were mapped?

No, what I mean is something like this:


struct file_t {
  char* name;

  int dir_file_count;
  file_t **dir_files; // array of pointers to files contained in this directory

  uint8_t *contents;
};


int main(int argc, char** argv, file_t *root) {
   // ... descend from root and access all the files
}
        

(... if we really want to, we could have separate stucts for files and directories, to make it sure that directories can't also have contents. But then I actually think that we shouldn't have a separate concept for "directories" anyway, so we just go with a single struct here.)

You can imagine using this like:


file_t *etc;

// Descend the tree of structs
for (file_t c = root->dir_files;; c++) {
  if (strcmp(c->name, "etc") != 0) {
    etc = c;
    break;
  }
}

file_t *our_file;
for (file_t c = etc->dir_files;; c++) {
  if (strcmp(c->name, "our_file_name") != 0) {
    our_file = c;
    break;
  }
}

printf("here are the file contents: %s\n", our_file->contents);
        

Of course, this is ugly and inefficient. We could have used some hash maps instead.

... anyway: what's happening in practice is that each time we dereference a pointer to a file we haven't looked into yet, the OS catches the page fault and puts an actual file struct in place, with the right file names, complete with further pointers to yet-nonexistent further files. From the point of view of the process itself though, it looks like we have a giant, single data structure that looks like it contains the entire file system, which you can access without any syscalls whatsoever, just by reading memory.

Why is this a good idea?

Well, actually, it really isn't.

Imagine the joy of never ever being able to close a file ever again.

(... although you could argue that closing files is an archaic notion, since every file shall be open at any time.)

Also, since you can not tell the OS that you won't be using a file again... what if someone else grows the file and suddenly you can't update your per-process memory map anymore because the added part would overlap with something you already mapped there?

Plus, it's an impressively ugly way of descending a directory hierarchy anyway.

... but then... why?

Actually, most of these problems go away once we treat pointers and memory addresses as implementation details... as in: using a memory-safe language. (... and using it as the actual OS interface so that the OS can garbage-collect unused files. Yes, that probable means "a Lisp Machine".)

After all, in the end, a file system itself is just a tree of binary blobs. Or, in fact, it's one binary blob (a hard disk, block device etc.) pretending it's a tree of binary blobs. If your memory model already involves a bunch of objects pointing at each other while not having obvious fixed linear addresses, a tree isn't that hard to fit in.

So, by further blurring the difference between "a piece of memory I got from malloc" and "a file I created on disk", we could further proceed towards making everything mathematically nice. This might also have questionable practical utility. But then... many things are useless until they aren't.

... comments welcome, either in email or on the (eventual) Mastodon post on Fosstodon.