This is still the header! Main site

Binary is Fun

2021/04/21

... this one is about formats, curiosity and how many barriers aren't even there.

Back when I was starting out playing around with computers, there were things that were easy to understand. There were drive letters, you could look at files using Norton Commander, and you could write programs in e.g. Turbo Pascal, which execute line by line, in DOS, can print text to the screen, you could even draw some things.

And there were the things that are magic. Windows programs, for example... they were plain EXE files, too, but somehow they produced entire window frames that could run at the same time as other apps, and they did these cool Official Windows-Looking Things, like buttons and title bars. My little Turbo Pascal programs could definitely not do that. You surely need some Magic Artifact to impart this quality onto code you write.

Well, eventually, I did get my hands on this magic tool, called Visual C++, along with some docs to explain which arcane incantations ("CreateWindowEx" and its ilk) to use to do this. However, the premise sounds pretty much unchanged: while mere mortals can write a lot of text-based formats (source code, HTML, etc.), any kind of binary format (e.g. program code, MP4 files...) is something More; one does not simply write programs that generate these. Want to process MP4s? Well, you should use libav, or... any of the numerous libraries that can deal with this.

However... the thing is: they're all just a sequence of bytes. With the right docs, you can read and write the right sequence of bytes. Which sounds extremely scary if you look at how many lines libav is... or how many pages the relevant ISO standard consists of.

They have to cover everything though. Especially when writing files, you can pick and choose which features you'll support; suddenly, you can set most fields to 0 and still have a reasonable output file. Just like you don't have to learn the entire who-knows-many pages of the HTML5 spec to write a hello-world webpage.

For example... you can add to and extract subtitles from an MP4 without even looking at the h264 video stream (which is a big win; video compression is one of the trickier things out there). Same for making some (lossless!) cuts (... as long as you're doing it at the keyframes).

Even if you don't end up writing your own version of every library dealing with binary formats, you'll get some insight into how things actually look on disk. Maybe that'll help you understand how some weird ffmpeg flags work. But... maybe you'll throw out that 40 MB library dependency that was the only tool so far that would make a magic change in a file for you, which... turns out to be a single bit flip.

Perhaps don't start with writing a C compiler though. Just looking at the Kaitai web IDE is a really simple, one-click way of getting exposed to all of this, with some file samples, even. Have fun!

This is post no. 6 for Kev Quirk's #100DaysToOffload challenge.

... comments welcome, either in email or on the (eventual) Mastodon post on Fosstodon.