This is still the header! Main site

Commas are Bad

2024/06/29

I do think humanity is putting a pointlessly large effort into parsing text. Instead of putting up with the minor (and temporary) nuisance of having to deal with parentheses, or storing things internally as a tree to begin with, we keep coming up with more and more, barely parseable, formats for tree-shaped information.

Some of these are somewhat okay. (I'm probably more okay with HTML than most, for example.)

There is one particular syntax element, though, that I think is particularly stupid, for multiple reasons.

Infix operators

Imagine expressions like


int a = 12 * x + 3 + 4;
          

Obviously, you could write it like...


(setq x (+ 3 4 (* 12 x)))
          

... which is barely longer, and also lets you pretend that addition and multiplication are functions, not magic special operators that eventually get compiled down to... functions. If you want additional readability, you can also do something like this:


(setq x
      (+ 3
         4
         (* 12 x)))
          

... but if you appreciate readability, you probably already wrote it like...


int a = (12 * x) + 3 + 4;
          

This article is not in particular about demonstrating how Lisp is nice though. (There are some other articles that do that.) So far, it's just been an intro to the fact that infix operators in fact do exist. (They are called as such because they sit between their arguments in code.) Addition and multiplication are such operators in a lot of programming languages.

Commas

Commas are an infix operator. In fact, you can overload them in C++, despite books explicitly telling you that you should probably not do that.

If you happen to be using C++, you can also do things like


a, b
          

which happens to be an expression that evaluates both a and b, and has the value of b.

It is not the typical usage of a comma though.

Their badness

Take this JSON:

[
  "one",
  "two",
  "three"
]
          

If we were to add a fourth element, we would need to add a comma to the line with three and add a fourth line.

[
  "one",
  "two",
  "three",
  "four"
]
          

If we ever want to reorder elements, we need to take explicit care of appropriately injecting or removing commas; we get syntax errors if we miss any or we add too many. At least some JSON parsers let you add trailing commas.

[
  "one",
  "two",
  "three",
  "four",
]
          

Of course, it's just only some JSON parsers. The official specification does not allow this.

As for why all these injections and reordering of commas is worth it:

[
  "one"
  "two"
  "three"
  "four"
]
          

... it's totally not. We can easily parse everything without any commas whatsoever.

Do we even need them?

Consider the usage of commas in the following, common use cases.


    void some_function(int a, SomeClass b, const char x)
    {
       some_call(b, x);
       std:vector list = {1, 2, 3};
    }

None of the commas in the above example are an actual operator. If we just completely remove them, we would still get text that's about as easy to parse:


    void some_function(a:int b:SomeClass x: const char)
    {
       some_call(b x);
       std:vector list = {1 2 3};
    }

(... with some minor modification to the parameter list.)

Are they all bad?

My theory is that most of the badness of commas is coming from the fact that most of the time, they don't actually do anything, apart from delimiting text.

For example, the + operator corresponds to an actual addition operation. If you encounter one, you typically need to add something, concatenate strings, etc.

There is no such operation when you see a comma-delimited list. Adding the 0th element is the same kind of operation as adding the 1st and the 2nd one.

Allowing trailing commas is a reasonable half-solution to this (demonstrated by the number of places allowing trailing commas). Replacing them with something else might be occasionally a better solution though?