Writing Code is a Conversation
Especially for simpler projects, writing code feels like you're building something. You sit down, you make a plan, and you just flesh out all the parts you need to take care of. It's a top-down process, optimally.
Obviously, despite our superpowers of doing things in our heads, this doesn't always go smoothly. Sometimes you need to backtrack and try another solution. Sometimes what you wrote just doesn't work, so you need to debug it line by line, printing a bunch of logs and staring at them for a long time. But overall, you start from somewhere, you write some code, and then you reach your goal.
Imagine, however, a project that is way larger than what a single person can write in an afternoon.
Let's say you want to add a feature. You're now staring at a ginormous code base, with hundreds of files, each having hundreds, maybe thousands of lines of code.
Where do you even start? How does this thing even work?
Suddenly, what you're doing is not engineering anymore. It's a kind of... science.
You're poking at parts of the code... in ways that are not unlike the way scientists are trying to figure out what a particular neural pathway or gene does in a complex organism. There is just these many moving parts that you do not understand; you are trying to filter out the signal from the noise. You do not always succeed.
For example, you are trying to add a new button to the user interface. You start by looking for a unique-looking other button that is already there, search for its title in the code, and try adding a new one the same way the original was added.
As it turns out, you might have found the wrong other button... that had the same title but serves a completely different purpose. Maybe you found the old version of said button that used to be functional but it is not an active code path anymore. Just by looking at the code, it's not always easy to tell apart which is which.
Once you have a piece of code that you know is at least somewhat related to what you are trying to accomplish, you start poking at it. You modify it a little bit. Can you print the pieces of data that is going through it? Does it look like the things that you expect to see? What kind of data are we even talking about? Why is everything wrapped in this odd piece of three-layer abstraction? What's the point? How do I even print it out?
There are surprises on the way. Your button works on everything but database rows that were added before last October. This has a good reason. You do not know what the reason is. In fact, you did not anticipate how this happening would even be possible.
You eventually get to learn about it though... it's... obvious in retrospect.
You have a range of possible avenues to gain information about what's going on in the code. You can obviously just... look at it. Read it from start to finish. For a certain project size though, this is not especially feasible. You need to be more specific about what you're looking for. You can follow along which piece of code is calling or is called by which piece of code. You can just grep for names (if it's easily greppable, that is).
Staring at static code only gets you so far though: eventually, you want to actually run the thing! Possibly, log some pieces of data, or use a debugger to inspect variables while it's running. Even just... modify what it does, and try exercising it; see what you broke with your modifications.
(This is an underrated use of unit tests, by the way: they are not just for checking whether the code you modify is correct or not, but also show you ways in which it will be called and what kind of data it typically runs on.)
Even once you roughly know what parts you want to modify and how, the best solution might still not be obvious. There might be multiple ways you could set up which class calls which class, and how you divide up the responsibilities between multiple files. Large enough code bases might still supply some extra surprises here, in the form of arcane APIs whose use you need to figure out as you go. First, the code you write might not even compile... indicating that the model of the rest of the system you had in your head is still not especially accurate.
You fix the compilation errors. The model gradually improves.
By the end of this process, you have a reasonably good idea of what you are actually doing. You have built a feature that seems to actually work! Moreover, the unit tests that do happen to remember a bunch of things that you still do not know about, agree with you on this.
At this point... you realize that you have spent 17 hours writing a hundred lines of code. The code is not trivial... but did it really need to be 17 hours?
Another way of looking at it though is that... what you have been doing is mostly not writing of code but talking to it. In a way, you're asking both the code and the world surrounding it to update the model of both of them in your head; in exchange, you modify the code to better reflect your requirements and how the world works. The modifications to the code are just the small and easy part though. The hard part is asking it the right questions.
In fact this is why, if your codebase is complicated enough, just talking to people who know how things work can make things disproportionately easier. Not because they will write the code for of you... but because talking to them makes it easier to get the same answers as the ones you would have to ask the code for instead.
This is an area of software engineering that could definitely use some extra effort. We have great programming languages, through which humans can express their thoughts towards the machines; it is a lot less developed how we are funneling information produced by the code back towards humans though.
Do we have good visualization tools for code structure? Anything for the data it processes?
Yes, there are logging frameworks... how do we do this in a nice scalable way though? Can you spot the one request in 10,000 that is not working for some reason? The gap in this list of 600 timestamps?
Does your source file take a minute of compilation to realize that you mistyped a variable name? Or you're missing a header? or that this template is not applicable to this specific type?
Actually, does it ever realize this, or will it happily compile and just die at runtime after five minutes of loading data, giving you a mildly inscrutable stack trace?
Obviously, if you could keep all your code (and its environment) in your head, most of these wouldn't be problems. This is not especially feasible most of the time though.
The second best thing for this is a tight enough feedback loop which lets you fix them as quickly as possible. Almost like adjusting gradients during training. You do not have to have an entirely complete or accurate map of the landscape in order to move along the gradient, and yet you will be able to optimize your system into a shape that you end up liking.
This is why visualizations are nice. Same thing with clangd
. And fast compilers. And unit tests. The faster and the more efficiently you can Do Science to your code, the faster you will figure out what code you'll actually want to write.
Actually writing it is just the easy part.