This is still the header! Main site

Bridges to Browser Islands

2024/08/03

Once upon a time, I had some friends over; we were listening to music, in the form of mp3s on a computer. (This fact might also help put a date on when this must have happened.) They were messing around with the player, picking songs, possibly poking at things on the internet. (Imagine a Windows XP desktop.)

Everything went well until one of them pulled a CD out of a backpack, with something to look at. He put it in, opened the file manager: the CD was nowhere to be found. The drive continued reporting being empty.

I do happen to remember some smug, gloaty feelings on my part, watching them try figuring out what went wrong and not yet discovering the important part of the setup. (I made it so seamless.)

(Not that it contained a lot of work from me... but still.)

The important part was that... what they were looking at was not the computer. The computer itself was sitting in a dorm room a couple of miles away; this was just a full screen remote desktop window on another computer. Thanks to how remarkably good RDP is though, not only wasn't it obvious from latencies (at least if you were not looking for it), but even playing music worked perfectly well.

As it happens, this situation is remarkably similar to browser windows these days. If you, instead of running the browser locally, you ran them on a remote server, streaming video with low latency between them... you wouldn't lose a lot. There is really not a lot of interaction between the operating system and the browser that wouldn't be replaceable by just looking at its outputs.

Browsers go increasingly out of their way to be their little islands. They refuse to use the operating system window manager, they handle their tabs separately instead. (Aren't tabs just a simplified tiling window manager after all?) They save cookies sent to them by websites; then, they keep them safe from the rest of the operating system by storing them in obscure databases and encryption. Bookmarks? They could be just files; instead, they implement half of a file manager just to organize them.

If you look at Chrome OS, it's just Chrome with a simple file manager & some additional settings pages. The fact that you can get away with this as an operating system shell says something about how complete browsers are in themselves.

With all the downsides of running something that could have been a set of remote desktop sessions.

Curl

To begin with, have you tried querying, via HTTP, anything that needs a login, from the command line?

Surprisingly, browser developer tools do have the capability of giving you a curl invocation that you can use from the command line to make the same request as the browser did. It even includes all the cookies you need for logging in.

It also includes the cookies you do not need to log in. Overall, you get an ugly wall of text; optimally, you can find the token that you still need to make the request successfully. You remove some of them. Still works. Remove more: access denied. OK, let's put them back. Iterate.

Worse yet: by turning this into a shell script, you can get something working... and then the next time you try it out, the tokens will have expired, nudging you in a subtle way towards just giving up. Yes, you could use the browser to log in again and copy around more tokens, but then this is the kind of thing that you either automate or just don't end up doing.

Alternatively, you could use curl itself to log in and get the cookies / tokens. This is typically one-off hackery that might or might not work without JavaScript, though. A lot of sites are not especially incentivized to have this work either, and will detect you as someone nefarious (even if all you wanted is not dealing with their terrible UI).

Browsers, with the right kind of APIs, could solve this question extremely easily. After all, they know exactly how to make these requests; they have the cookies stored too. All that we would need is a command line tool that, given the context of a domain or an already open tab, sends out the request. Imagine chrome-curl https://your-website.com/some/api Just Working (or, if it fails, chrome-curl --interactive showing you a GUI login window with the actual site, and then doing its job).

Shell

Likewise, imagine being able to run arbitrary JavaScript in any tab you want, straight from the command line. See a table? Grep through its contents. Autofill forms from a bash script. Save the list of your open tabs to a text file.

It could work from the browser's side too. How about right-clicking something that you match with a regex pattern on the browser side, having the option to run a shell command on it?

You might be able to do some of these things currently, with hacky and underpowered browser extensions, from third parties. (Do we trust these extensions though?)

Of course, none of this fits into the browser security model... but then said security model is about protecting the user from websites, not websites from the user. They are called "user agents" for a reason, after all.

Filtering & inspection

Being able to run as part of a page is neat enough; having power over the session can be even more powerful though.

Take, for example, a messaging app, using a custom JSON-based protocol to communicate with its server. Wouldn't it be nice if we could just dump all these messages into another app? Or, alternatively, get notifications when new elements are appended to a part of the page?

This functionality is already available through a debugging interface, which has roughly the same power as developer tools in browsers. For this to work, you need to launch the browser with special flags though, and keep a (possibly insecure) port open. Could this instead happen on the fly, dynamically, per tab? With the tab file tree mounted as file system that you can use standard tools to look at?

Tab switching

Things are slowly improving, since we last checked on the topic some time back. There are browsers that now have the capability of using vertical tabs! (Something that Windows 95 came up with for the taskbar approximately... 30 years ago.)

Progress is not obviously forward though. Does anyone remember the Android feature where you could see your browser tabs merged with apps, in around 2014? (Imagine each tab as its own browser window activity in the app switcher; you could get a pretty neat global system / browsing history by just scrolling through "apps" leftwards.) At the time, I was very happy to see how things are finally being done in the correct way. Well... instead of fixing shortcomings that some people were mildly upset about, they discontinued this in 2016 (evoking sadness from a likely distinct group of people). Of course, it's impossible to treat that as an option that some people like and some people don't, because... software is terrible and the only way we can keep it working is forcing the exact same thing on everyone.

I did not give up though. In fact, on my work Mac, I've had an extension (for about 1.5 years) that takes apart every multi-tab window to separate single tab ones. The user experience is... questionable in ways, but at least it trains you to be open to the possibility of seeing multiple browser windows at the same time, a skill that's often useful but fairly forgotten.

The main point here, though, is that the operating system window manager should integrate with browsers, instead of not knowing anything about the tabs inside of them. This goes both ways: why shouldn't it be possible to add a normal desktop app as a browser tab? Why are "tabs" a browser thing? And why not organize them as tabs but have all of them show up separately in UIs like the macOS Misson Control?

Tab saving

Open tabs are clearly not equivalent to just bookmarks. Mobile browsers, in particular, are exceedingly good at saving tab state, reloading them from disk months and many browser upgrades later, with content that we do not even fetch from the remote side.

Under the hood, there probably are just some files somewhere.

Could they be our files?

Obviously, keeping compatibility between browser versions and different computers is hard. We seem to have solved a subset somehow though? Being able to freeze a tab, save it to a shared drive, and reopen it from another computer would be pretty cool, especially if you can do it en masse with an entire browser window. Actual backups for browser tabs!

Extensions & user scripts

There is a lot of low-hanging fruit, even on the less ambitious side.

Think browser extensions. They are surely just files somewhere. Can you add a directory and have the browser just recognize it as a browser extension? Of course not! You need to jump through various security hoops to install an extension that is not packaged & signed; then, your extension might still go away if you restart the browser though.

This situation is even more silly with user scripts. They are somewhat minimalistic alternatives to extensions, to load scripts easily on particular domains. Imagine adding some extra links next to given buttons, or changing some colors around. Are these files? Or did we yet again re-implement the file system, together with, an app store thing and a questionably subpar editing experience?

It's not even about just whether things are files or not though. For example, CSS is nice because you can style webpages independent of the actual markup... In fact, there is userContent.css, a file that you could edit and which will apply to every web page you visit. Assuming you find the exact about:config setting that enables it, create it at the right obscure profile path and ... possibly restart the browser to make it apply?

(Now imagine right clicking something on a page, selecting "Customize...", and being able to modify fonts, colors, and so on, without even having to figure out what the exact CSS selector is. Or writing CSS at all, for that matter.)

It's your computer

Web pages started out as a weird, remote document format that your computer could read and process. These days, a lot of them turn into an efficient form of remote desktop, giving you a very limited control over how your data is being handled.

As it happens though, you're still not looking at a compressed video stream, protected by DRM. You still have local tools that are supposed to help you process and filter information that you encounter. This includes websites.

Sadly, it's not in the interests of many companies to give you tools for this. Obviously, if your main business model is about showing users things that they don't want to see, you are not going to be especially enthusiastic about adding more control to the browser you happen to develop.

Browsers are still reasonably open source though. Maybe we could add functionality like this?