PATH should be a system call
(... but... it's a variable... how do you even)
Let us present the problem.
This is Emacs starting up and loading some Lisp files. For which we first need to figure out where to find them.
As it happens, they could be found at many possible locations. There is a list of these locations in the load-path
variable; our method is to check whether it's present at each of them. (Also, maybe some of them come gzipped; let's check for those ones, too.)
On my not especially overcomplicated Emacs install, the list has 59 elements.
At first sight this sounds like such a niche problem. Not only is it about Emacs but it's also Windows; the latter is somewhat known of its less than excellent performance when it comes to small files.
As it happens though, bash
on Linux does the exact same thing. We have a list of directories on PATH
, and, whenever we want to launch a program, we'll go and check each and one of them for the files we are looking for. We're fairly lucky though: the list is pretty short.
~ $ strace bash -c asdklfjasldfjaskldfasdljf
(...)
newfstatat(AT_FDCWD, ".", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
newfstatat(AT_FDCWD, "/home/simon/bin/asdklfjasldfjaskldfasdljf", 0x7ffe5ff8d3c0, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/bin/asdklfjasldfjaskldfasdljf", 0x7ffe5ff8d3c0, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/asdklfjasldfjaskldfasdljf", 0x7ffe5ff8d3c0, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/bin/asdklfjasldfjaskldfasdljf", 0x7ffe5ff8d3c0, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/games/asdklfjasldfjaskldfasdljf", 0x7ffe5ff8d3c0, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/games/asdklfjasldfjaskldfasdljf", 0x7ffe5ff8d3c0, 0) = -1 ENOENT (No such file or directory)
... except wait, now we're looking for ourselves?
newfstatat(AT_FDCWD, ".", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
newfstatat(AT_FDCWD, "/home/simon/bin/bash", 0x7ffe5ff8d490, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/bin/bash", 0x7ffe5ff8d490, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/bash", 0x7ffe5ff8d490, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/bin/bash", {st_mode=S_IFREG|0755, st_size=1265648, ...}, 0) = 0
newfstatat(AT_FDCWD, "/bin/bash", {st_mode=S_IFREG|0755, st_size=1265648, ...}, 0) = 0
... and also... let's not forget about our localized messages.
openat(AT_FDCWD, "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US.utf8/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
newfstatat(2, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x1), ...}, AT_EMPTY_PATH) = 0
As it happens, Python is slightly smarter than either of the two above. Instead of trying various file names, it will just go and lists directories right away; it is probably this & some caching mechanisms that allow it to find some modules pretty quickly. (We're still looking for __init__.py
and similar ones one by one though.)
simon@anarillis ~/tmp> strace -f python3 -m our_test_dir.our_test_moduleb 2>&1 |grep our_test
execve("/usr/bin/python3", ["python3", "-m", "our_test_dir.our_test_moduleb"], 0x7ffc087c2c38 /* 17 vars */) = 0
newfstatat(AT_FDCWD, "/home/simon/tmp/our_test_dir/__init__.cpython-311-x86_64-linux-gnu.so", 0x7ffc3025b8e0, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/home/simon/tmp/our_test_dir/__init__.abi3.so", 0x7ffc3025b8e0, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/home/simon/tmp/our_test_dir/__init__.so", 0x7ffc3025b8e0, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/home/simon/tmp/our_test_dir/__init__.py", 0x7ffc3025b8e0, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/home/simon/tmp/our_test_dir/__init__.pyc", 0x7ffc3025b8e0, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/home/simon/tmp/our_test_dir", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
newfstatat(AT_FDCWD, "/home/simon/tmp/our_test_dir", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
newfstatat(AT_FDCWD, "/home/simon/tmp/our_test_dir", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
newfstatat(AT_FDCWD, "/home/simon/tmp/our_test_dir", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
# here is the dir listing!
openat(AT_FDCWD, "/home/simon/tmp/our_test_dir", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
write(2, "/usr/bin/python3: No module name"..., 64/usr/bin/python3: No module named our_test_dir.our_test_moduleb
Nevertheless, it seems that "trying to find files with a set of possible names in a set of possible directories" is a fairly common operation that not everyone has optimized yet.
(Also, is "optimizing" this really a good goal? Or does it just stand for "OK workarounds for missing file system APIs"?)
Solving this a nicer way
How about... instead of asking the operating system for a combination of n files at m different places, we could just give it the list of possible files and the list of possible places?
This would already cut down on the number of system calls, and, if this is going over a network, the required roundtrips.
AS/400 libraries are, by the way, solving a very similar problem. While I'm not sure what implementation they're using underneath, they have at least a good chance for not having to try every combo all the time, given their database "filesystem".
But then, in the end, we are just trying to perform a query, to select all the source files ever WHERE they have one of the given names & then we pick the ones that are in source directories we prefer the most (e.g. come first on the PATH list). That's it.
As it happens, Postgres can solve this problem extremely well and quickly. (... there might be a blog post on how, at some point.)
Could it be something that the operating system or the file system just... does for you, quickly and efficiently?