The duck blog: Friday, April 19, 2024 post

Is readability a characteristic inherent to languages or code?

The topic of readability is actually a lot more complex than simplistic statements like "TypeScript makes code more readable" would have you believe. If one thing (e.g., your choice of programming language) would fix all readability issues, you would think nobody would actually oppose it. But there are plenty of people who find TypeScript harder to read, myself included.

Since answering such statements with just "no" or "it depends" doesn't bring about any useful insight, I've decided to jot down my notes about code readability.

Readability in natural languages

If we look at natural language, there are many factors that influence readability. For instance, if you're not familiar with 日本語, you may have trouble reading text in that language at any level. Once you are familiar with the writing system and some syntax and grammar, you may be able to read the text but not understand it. You will still need to build up vocabulary. Even after you start commanding a fairly impressive vocabulary, you may not be able to understand obscure local idioms, phrases that are intentionally twisted for artistic effect, scientific papers in a domain you don't understand, and so on.

Once you have decent reading ability, can the text itself hinder your ability to read? Other than what we've mentioned, such as the use of specialized or obscure language and/or words, there are some presentational aspects that can hinder reading. For example, text is too small, text is in a color too similar to the background, wrong writing direction, flipped or rotated letters, poorly designed font, and so on. While these too can be overcome with familiarity, they take significantly more effort, and can in some cases be quite impossible to surmount.

Readability in source code

From the most obvious to less obvious, we have different aspects that affect our ability to read code.

Let's first get the most obvious out of the way:

Editor settings (font, color contrast, etc.)
Display (resolution, distance, etc.)
Familiarity with the language (do you even know how to read it?)

From this point on I'm going to assume those factors are non-issues.

I divide the other factors broadly into three categories:

Formatting
Naming
Patterns

Formatting

The first issue with reading code is the formatting. In some languages, formatting is an integral part of either the language itself (e.g., Python's indentation rules) or culture (e.g., Go and Elm formatters being standardized on across the community). In languages like JavaScript, we have several competing standards for formatting, but they are, for the most part not too different from each other. It is still possible to write broken code like this:

exports.GET = function
    (req, res) {
    indexes
        .getLatestPosts()
        .then(
            function (posts) {
                res.renderToTemplate(
                    'index',
                    {
                        posts
                    }
                )
            }
        )
        .catch(err
            => res.renderToError(500, err))
}

This is syntactically correct JavaScript that the compiler will happily execute for you. Even though it's technically possible to decipher this, it is nevertheless hard to read. Code formatted in a more idiomatic way might look like this:

exports.GET = function (req, res) {
    indexes.getLatestPosts()
        .then(function (posts) {
            res.renderToTemplate('index', {posts})
        })
        .catch(err => res.renderToError(500, err))
}

From this point on, all discussions related to formatting belong in the realm of preference. For example, whether to add spaces within braces in:

{ posts }

That has very little impact on readability. Apart from, perhaps, a very temporary glitch when someone encounters a formatting pattern they are not familiar with (e.g., they've never seen people put spaces inside braces), it will probably not prevent anyone from quickly gathering that we are talking about an object.

There are two kinds of familiarity with formatting patterns, though. Firstly, we have the familiarity based on our encounters with numerous different examples of JavaScript and noticing the commonalities and differences between them. This is what is commonly referred to "experience". Secondly there is familiarity at the project or company level, where we get accustomed to the patterns within a much smaller body of work.

To reuse the above example, we have an inconsistency in the use of callback functions: one is written as a anonymous function expression, while the other is an arrow function expression. While they are both perfectly readable, it creates a small temporary glitch while we decide why there's a difference and whether it matters. We would not have this dilemma if both use the same style of function.

exports.GET = function (req, res) {
    indexes.getLatestPosts()
        .then(function (posts) {
            res.renderToTemplate('index', {posts})
        })
        .catch(function (err) {
            return res.renderToError(500, err)
        })
}

Internal consistency of formatting therefore plays a role in readability. It trains our eyes to recognize what's coming next and what that might mean in the given context.

There are various formatting tools on the market. However, tools aren't always effective in improving readability. In communities such as Python, Go, or Elm, we have tools that are pretty much agreed on by the entire community (the PEP8 standard in Python, go-fmt tool in Go, and elm-format for Elm). These conventions and tools are generally adhered to by most programmers, and therefore coding in those programming languages (especially Go and Elm) makes moving from one unfamiliar code base to another a lot easier than in other languages. This is mostly thanks the the consistency that these tools introduce. They achieve this with careful use of whitespace and line breaks (most of the time).

On the other hand, presence of a tool or mass adoption does not always guarantee improvements in readability. For example some JavaScript formatters are particularly bad when it comes to breaking the consistency within the code due to placing (an arbitrary) line length limit above consistency. Some of these tools are therefore seen as simple means to terminate code style discussions — which can, sometimes, get out of hand — rather than a way to improve readability.

Naming semantics

Single-letter variables are widely considered to be bad for readability, but there are always legitimate cases where a single letter variable makes code relatively easy to follow. For example:

var names = []
for (var i = 0; i < peope.length; i++) {
        names.push(people[i]).name
}

Here the variable i is more or less a convention for counters within loops, so it would actually look unusual if we used something more verbose. The same goes for letters j, k, and l, which are used when loops are nested.

For, for example the use of x, y and z as for coordinates, n and m for any numeric value with no particular semantic other than being a number, and so on. When catching errors, e is commonly used in the catch() clause, and k and v are sometimes used to mean 'key' and 'value' respectively. These are conventions that sometimes span multiple languages, and are not particularly difficult to figure out through reading other people's code on regular basis.

Some of them are only meaningful in a certain context. For example moveTo(m, n) is not as common as moveTo(x, y) because in the context of the moveTo() function, the parameters are not just any numeric values but represent coordinates.

Using long names does not necessarily improve readability. Consider the following example:

if (flavorList.includes('red')) {
    return 'No red flavors, sorry'
}
if (flavorList == 'green') {
    return 'Green is sold out'
}

The code is a bit confusing because we're not able to tell whether flavorList is intended to be a list (an array) or a string. In fact, it is probably a string because the includes() method exists on both strings and arrays, but in the second case we're comparing to a string.

In fact, it's also possible that the word "flavor" should actually be a "color". Common naming mistakes include:

Using verbs for values that aren't functions
Using nouns for functions
Using plurals for single items and vice versa, singular for multiple items
Using meaningless names like object, array, number
Sometimes names that are good on their own may actually hinder readability. Common example of this is using singular and plural forms that differ in one letter only:

var totalPages = 0
for (var book of books) {
  totalPages += books.pages
}

In this case, it's relatively easy to miss that books on line 3 should be book.

This type of bad naming definitely makes the code harder to read. However, there are legitimate names that are correct, but still hard to understand. For instance:

<ChoiceList isStandard choices={seatChoices} />

In this code, isStandard is probably not understandable to anyone reading this. Sure we understand what the word "standard" means, and we know the React syntax (well, some of us do anyway) so we know it's a Boolean property. However we cannot tell whether the seats are standard, or the choice list, or something else. Only people working on the project know that it means it's a form field that is present in all forms of this type, and not a custom, user-specified field.

Someone in the back says well, you should have just said customField={false}. Ah, but could you have made that suggestion before I explained what isStandard means? I doubt it. And even if we had it that way since the start, would it actually be more clear? Well, now it's too late to find out because we already know what it means.

Although isStandard was not readable to anyone reading this post, it is unreadable due to familiarity reasons, not because of a naming mistake or syntax abuse. Whether you use JavaScript, or TypeScript, or some other language, makes very little difference in this case.

Also, even though isStandard is a domain-specific name (specific to the project), there are other, more generic names that may not be familiar to those who have never encountered it before.

For example:

function updatePriceTable(semaphore, products) {
    semaphore.attemptLock(function (release) {
        db
            .writePrices(data.map(function (product) {
                return [product.id, product.price]
            }))
            .then(release)
    })
}

If you are not familiar with semaphores, chances are you wouldn't be able to guess if this is a domain-specific name or a generic programming concept. Especially since, in JavaScript, semaphores aren't a very common. Could we have named it resourceAccessLocker or something more descriptive? Sure, we could have. But the point is that the word "semaphore" is a legitimate name for that concept. Familiarity with the domain is not the only kind of familiarity that influences readability of code. The knowledge of general programming concepts also plays a role.

Familiarity with general programming concepts can also sometimes hinder readability, though. For example, people coming from various languages to JavaScript sometimes assume that const makes values immutable based on their prior experience with how constants work those languages. In JavaScript, const variables have a different semantic, which breaks the intuition these people have about how it is supposed to work.

Patterns

Moving up from names of objects and concepts, we have a bigger picture: the patterns. The ability to read larger sections of code depends on familiarity with various software design concepts. We are not talking about the ability to read for loops. That's more of a language familiarity issue.

We are talking about reading something like this:

var person = {
    name: '',
    age: 0,
    toString() {
        return `${this.name} (${this.age} y/o)`
    }
}

var bob = Object.assign(Object.create(person), {
    name: 'Bob',
    age: 22,
})

This is a prototypal inheritance pattern popularized by Douglas Crockford a long ime ago. If you're unfamiliar with this concept, you may not be able to understand why it's done this way, even if you understand what functions like Object.create() or Object.assign() do. Here the pattern is in the particular usage of these functions to achieve a certain effect (inheritance).

Or consider this:

function handleRequest(req, res, next) {
    // ....
}

The arguments req, res, and next are immediately familiar to people who have used Express or some other NodeJS middleware-based library. They will probably have a lot of intuition about what these represent individually, as well as the implication of having a function that takes them in that particular order.

Or a function like this:

function addVnodes(parent, before, vnodes, startIdx, endIdx) {
    // ....
}

It's a very niche domain, but if one has dealt with virtual DOM at any point, they might recognize that this function deals with a virtual node, and, in particular, dealing with the insertion of a virtual node within a node's child list. On the other hand, it would probably cause a WTF moment for someone who has never seen one of these.

Algorithms can sometimes be quite unreadable to people who are not familiar with them, or at least with the particular one being read, but people using them on regular basis may recognize them by shape, even before they proceed to read into the details.

On the other end of the pattern spectrum, we may have code that looks like this:

var session = new Store()
session.addEventListener('new', function (event) {
    // ...
})

This time we are borrowing a pattern from the underlying platform. Anyone who has worked with event listeners in the DOM will immediately recognize that the Store object works like an EventTarget. Now the intuition they have about how that works will inform them that the event object probably has a target that points to the Store instance, and that they can probably removeEventListener(). In other words, we are reusing the intuition by reusing the pattern, and thereby improving readability.

If we do this wrong, however, we've successfully reduced readability. For instance, if the event does not have a target property pointing to the Store instance, we've made the Store object counter-intuitive: the user following their intuition about how EventTarget works will have misread the last example by making wrong assumptions about it.

Recognizing various patterns is similar to recognizing sentences or even entire paragraphs. For example, most developers will know what it means when an email starts with "After careful consideration", even without reading the rest. This works the same way when reading various code patterns.

Readability is not simple

People like to say that this makes code more readable, or that makes code more readable. In reality, Readability is a much more complex topic, and there are many variables that make or break readability (pun unintended). It is, in my opinion, silly to even think, let alone claim, that a single keyword or even a whole programming language, can improve readability.

For all these reasons, I also do not believe that anyone can write code that is universally readable to everyone else. Sometimes, we may struggle to read our own code as our familiarity with the domain evaporates with time. This is not because our code was unreadable, of course.

This is precisely the reason I believe comments do help. In the short term, we may think something is quite obvious, but we can never be sure that the knowledge that makes it obvious will not simply vanish from our memory over time. A claim that comments are unnecessary if the code is well-written is, I think, a bit naive or at least short-sighted.

Although there are things you can do to the code to improve readability, improving the reading skill (i.e., familiarity with the language, patterns, conventions, domain, practices) is far more important, and can overcome even code that is not ideal.