The duck blog: Monday, April 22, 2024 post

Testing for developers is not the same as testing for quality.

Testing is — and will always be — a hot topic among developers. There are different kinds of tests. People talk about unit tests, integration tests, end to end tests, functional tests, regression tests... tests, tests, tests!

And then you have people who say "don't write tests", or "don't commit tests". Today, these statements may sound quite shocking, but let's explore a bit what this point of view is, and how it contrasts to the currently-prevalent understanding of the tests.

To support the points I'm about to discuss in this article, I am going to classify tests in two categories according to what their role is in the development process rather than the way in which they work. I will call these categories "dev tests" and "QA tests". I'll also try to explain why the point of view that QA tests should not be written by the authors of the production code isn't as irrational as some might think.

Developer bias

There's a joke that goes something like this. Statistician, physicist and mathematicians are riding on a train. As they are passing by some pastures, they see a lonely brown cow, grazing out in the open. Seeing that, the statistician says "Based on this sample, I can say 100% of cows are brown." Physicians says to that "Based on what I can observe, I can only say that one cow is brown." To which, the mathematician replies "You are both wrong. We can only claim that half a cow is brown."

When we work on a piece of code, we first have to figure out what we're building. It's quite rare in our industry that a developer has the same level of expertise in a given domain as a domain expert. That is, unless they're building a developer tool. In any business (including our own), there are many things people learn that become more like a reflex reaction than a coherent — and slow — thought process. There are things they "just know". This kind of implicit knowledge is hard to convey to others. It's one of those situations where people say "Don't know how I know, but trust me, I know."

Although we are pretty good when it comes to writing code, we are usually dealing with partial view of the domain. Half a cow, if you will. Even when we have invested significant energy into learning a lot about it, we still miss at least some of the picture. We simply cannot compare to someone who is doing it full-time for years. Because of this, our view of the business domain is more or less biased. When you combine that with the need to reconcile our biased knowledge of the domain with the technical aspects of the software and organizing this knowledge as code, as well as budgetary concerns, schedule, and our skills as developers, the picture gets even more skewed.

On top of that, assumptions keep changing as we learn. Even though the assumption may be instantly corrected in our brain as a result of new information, the code doesn't automatically correct itself. We sometimes end up tackling code bases that are developed under one set of assumptions but need fixes based on an entirely different ones.

This is why I believe that it is counter-productive to treat our own assumptions about the domain — and, by extension, the resulting code — as set in stone. We simply never know when we'll run into some new information that we did not foresee. In other words, we always need a fresh pair of eyes at the end of the day.

Dramatic changes in the assumptions don't happen very often in some domains, where the industry as a whole resits change until it is compelled by law or public outcry. On the other hand, I have seen major changes in projects that are over a decade old that tackle issues that are in such domains. This is not necessarily due to a change in the domain itself, but the development team's understanding of the domain and their subsequent realization that the assumptions underpinning the existing code base were wrong.

Tests as pair of eyes

Most software engineers today take it for granted that tests represent another pair of eyes on our code. We have the production code that we write to facilitate some business process, and then we also have a separate code base which "looks" at our code and tells us that it's working correctly — QA, in other words. The QA team may additionally write more high-level tests, or there may not even be a separate QA team and developers would be asked to write tests at different levels of detail. That's the common notion engineers have about the tests.

But who writes these eyes? As I said before, we always need a fresh pair of eyes on our code. If we are the ones writing the tests, the test eyes are not fresh — it's the same tired eyes that have been coding for hours.

TDD has the potential to address this issue to an extent. With TDD, we write the tests before we write the code, so our eyes are technically still fresh... for the first few times. Unless we write all tests before writing any production code, soon this advantage wears off. As we get into the nitty-gritty of implementing the code after each test, our eyes get tired, and the thought process more and more biased. Every subsequent test is less and less a fresh pair of eyes. Not to mention that we are probably also engaging in software design before we write our first tests, so we're starting with some bias already. And I believe that's perfectly fine. I don't think we can write code from a clean slate without any prejudice. In fact, the more experience we have, the more this holds true.

As long as it's the developer writing the tests, it's not going to work the way we would hope. The only people that can effectively test our assumptions are anyone other than us. Some companies hire QAs for this reason, and that's a good thing.

There are, of course, exceptions to this. For example, you may have an actual domain expert on the team that doesn't code, and doesn't know how to code, so they can give you an unbiased set of eyes to oversee the tests. Such organizations are rare. You may claim that that's the only reasonable way to do things, but as long as it's not the most readily available one, most companies are not going to do it.

Tests as... tests

If we, for a second, assume that the above is correct, and that having developers write QA tests is not optimal, is there any point in writing tests to begin with?

Suppose you're writing some code. Say, a function. There are many ways in which you can make sure what you've written works. For example the most straightforward way is to just import it into an isolated context (e.g., command line console) and call it. We do this because we want immediate feedback about what we have accomplished, so that we know that we can move on to the next step.

The automated version of this is what we commonly refer to as "unit tests". This is better than manual testing in that it helps us rerun the tests multiple times as we work. For small functions that have a small number of inputs, and not a lot of room for error, it may not be worth setting it all up, but for larger functions with complex inputs and/or outputs, it can be a worthwhile investment.

These tests are mostly about feedback. Plugging the function into a program, and poking at the outermost user interfaces can be far more time-consuming than poking at the function directly, so you get feedback faster.

This is what I mean by "dev test". These tests are not a fresh pair of QA eyeballs on your code, but more akin to scaffolding around it to support development and make sure you don't veer off course or a head lamp that lights your way as you walk through a dense forest at night.

When you view tests this way, it also turns out that the choice of tools for doing the tests expands greatly. Since the point of these tests is to ensure a steady flow of fast feedback, the usual test tools may actually be counter-productive. They may also fit one kind of code better than another (for example, some types of tools are better suited for UI testing, while the same tools may suck at testing plain functions).

In fact, it does not have to be a specialized test tool to begin with. Quokka is a surprisingly effective tool for getting fast feedback when working with JavaScript, and for many types of code it gives better feedback than dedicated testing tools.

What about quality?

For tests related to the quality of software, we need — as I said before — a fresh pair of eyes. A dedicated QA team is always a better solution for this, regardless of how they test the software because they are unbiased as to the implementation details. The second best solution is for an actual end user or domain expert to give it try. The third best solution is to do dogfooding internally for a while prior to release. You can, of course combine all three approaches. (Of course, dogfooding is not always interesting to the developers, so it's understandable that companies don't do it.)

At any rate, I believe it's important to differentiate between the need of the developers to determine that what they wrote actually works, and the need for the business to ensure what developers did has a meaningful business impact and is free of undesirable side-effects. Asking the developers to take care of both is, in my opinion, a good recipe for a conflict of interests. As developers, we have slightly different priorities, and that's why we get paid in the first place.

To put it another way, "works on my machine" is a perfectly reasonable goal as far as the developers are concerned.

(I've seen many developers making fun of people who use this phrase, but realistically, unless you're going to ship someone else's machine to the developers, I don't think it's reasonable to test things other than in the local environment, a.k.a. "my machine". For some type of work, it's, of course, not too expensive to poke into some other machine, like a server, but for some other cases, like mobile or client-side web apps, it's quite difficult.)

Keep or delete the tests?

I mentioned briefly something that people may find quite shocking. "Don't commit tests", some said.

Tests can be safely kept as long as the assumptions that lead to them remain intact. That means that, in most projects, they can never be safely kept. The assumptions evolve over time, and since we are not the domain experts, our assumptions are going to evolve much more rapidly.

I'm aware that there are teams that have a domain expert as a permanent team member, or developers explicitly get trained in the domain, which may lead to a situation where tests can actually be kept safely forever, but that's not the usual setup. Domain experts are scarce, and they are usually serving the company much better by doing what they're expert in, rather than participating in software development full-time. So, as a general rule my opinion is that the dev tests cannot be considered solid enough to be kept forever.

The cost of having tests that have ceased to support the underlying assumption can be quite high. Most developers can tell you that having lots of tests break after a change tends to result in missed dinners and even sleep.

Such incidents indicate two issues. One is the change in the assumptions. It's a problem, but it happens — it's normal. Another issue is poor isolation between different parts of the software, and tight coupling. Sadly, neither of these issues can be addressed by tests.

Since tests cannot be kept forever and are subject to ongoing maintenance just like the rest of the code, they can — and usually do — turn into an additional pressure point for the developers.

Some may be thinking that it's actually great that tests were able to show us we have a problem. I would also like to highlight that that tests did not prevent these problems, and were only able to indicate this to us when they got way out of hand. And now, the tests themselves have become part of the problem.

In my opinion, this is simply not a good cost-benefit ratio. I can, therefore, see why some would seek to challenge the notion that tests, once written, must remain part of the code base.

Hajime, the duck guy

Monday, April 22, 2024, by Hajime Yamasaki Vukelic

Developer bias

Tests as pair of eyes

Tests as... tests

What about quality?

Keep or delete the tests?