This is still the header! Main site

UNIX the Language

2021/07/17

... in which we introduce an admittedly weird point of view.

UNIX is a great operating system.

Built on the principle of the composition of tools that do one thing only but do it well, UNIX has been immensely successful. Basically all smartphones (iOS, Android) and many desktops / laptops (OSX, Linux) run a UNIX-like OS; the command line itself is really powerful and even non-UNIX systems are trying to model it (PowerShell). Isn't this all great?

... or is it?

Well, in the above paragraph, I was ignoring the tiny issue that most of the users of the devices listed there won't ever see an actual command line. But... developers most likely will. Well, maybe. But... sure, let's ignore phones and most MacOS users; developers of some of these platforms are still relying on the command line, and there are all the shell scripts underlying all of it!

... so the command line is great.

Of course it is. It's actually so powerful that it actually enables tools to become simple and effective. You can replace fancy GUI tools with some command line hackery in many cases... or avoid having to write tools in C++ that would take thousand of lines to complete, for the same functionality. You can just write a quick shell script, and...

... so shell scripts are great for programming.

Well, okay, no. Shell scripts are, actually, easy to mess up, hard to debug and slow. But shells are available everywhere, and you still get the benefits of the command line, so overall, especially for smaller things, they're great! Just... try not making them larger?

The point is: the UNIX command line is not really a programming language. It's the operating system. They're two different things, with very different purposes. One is to write software in, and then you launch these tools; it's different! It's not like anyone has managed to mix up these two. It's kinda obvious how you can't even imagine...

Top of the case of a LISP Machine

... well so um apart from LISP machines, you can't even...

So what if UNIX is a programming language?

Or... a "platform", really, with a standard library and everything.

Surprisingly, viewed from the right point of view, LISP machines and UNIX share a lot of properties!

To begin with, you're working in a dynamic environment. You don't need to restart your computer after modifying one of your shell scripts or recompiling one of your functions (in UNIX, they call them "programs"). Actually, everything is neatly persistent: your working image is automatically kept updated on disk, so, apart from some shell variables, you don't have to start over even after a restart; your functions / scripts are still there!

(Compare that to C, C++, Python, or basically every other environment out there. Yes, Python has a REPL, technically, but "let's just run the entire thing again" is still the norm. Not so with either UNIX or LISP machines. Or emacs, for that matter.)

As you might have noticed, UNIX the language operates primarily on strings and files. LISP is all about lists. C cares a lot more about bytes and pointers. Of course, you can use any of them to interface with anything else, but there a difference between "easy" and "possible".

Basically, the UNIX calling convention is "a list of strings, plus a file for stdin and another for stdout". Of course, you could do this in any programming language, but UNIX also helps you enforce an approximation of functional purity by preventing you from sharing memory between pieces of code on your stack, by giving each level different pieces of virtual memory. As a result, you have to turn your data into text and pass it through either the string parameters or files. (If this reminds you of RPC and marshalling... well, it's exactly what it is.) LISP machines are a lot more lenient in that regard, making everything faster, but also lacking many of the security benefits.

UNIX, by default, is an interpreted (and, thus, slow) language. Just like Java has JNI and Python has extensions, you can write UNIX functions in different languages and just call them from UNIX... and then you can call back into UNIX-land (... see system (1)).

Not surprisingly, UNIX the language doesn't quite extend down to the kernel level (well, it's slow and it's using a lot of services that you can't really write in itself. Unlike LISP.) So it also comes with another programming language, C, for low level parts.

C in itself is really not that great of an environment: think of single-executable-image embedded systems with no "OS" or file system; if you want to change anything, it's a restart / reflash, and you need gdb if you want to inspect anything. However, it works reasonably well both as extension modules in UNIX environments (those they call "programs") or as kernels (mostly monolithic ones). Of course, the fact that you need two programming languages to construct a single system makes them a lot more brittle than "LISP all the way down".

Another downside of UNIX is that it doesn't come with a decent debugger. You can, technically, dump out the list of running functions in a tree, but you can't quite step through the stack of shell scripts. Even "C only" is better in this regard.

Furthermore, while the UNIX model is nice for batch jobs / function calls, the "every stack level is a separate piece of virtual memory" model doesn't quite lend itself well for writing long-running code. As a result, systems written purely in UNIX (e.g. the SysV init system) are fairly messy; most GUI code is written entirely in other languages, with the only function call exposed to the UNIX side being "start this GUI". There have been efforts to add more flexible RPC schemes that would fit long-running GUI programs better (dbus, etc.); it's not overly well-integrated with the existing system though, and this way we're just adding more calling conventions (UNIX, C, and now dbus). No wonder that GUIs are brittle, hard to interoperate with, and fairly static: this is what happens if you're forced to think in C instead of UNIX or LISP.

... so...

UNIX is nice and fairly dynamic; it's a lot closer to LISP machines than C. Unfortunately though, somehow we convinced ourselves that UNIX is not a mediocre programming language / environment but a great operating system, and therefore we aren't even trying to replace it with another programming language.

Even though we already did this once.

This is post no. 18 for Kev Quirk's #100DaysToOffload challenge.

... comments welcome, either in email or on the (eventual) Mastodon post on Fosstodon.