We need to stop using white/black box testing categorisation

First things first: happy new year lynx! I wish to start this year with a somehow sensitive topic, and I want to apologise in advance if I say anything wrong, mind that it is said from my heart and my current knowledge.

According to “Biased” by Jennifer Eberhardt: “Categorization -grouping like things together – is not some abhorrent feature of the human brain, a process that some people engage in and other do not. Rather, it is a universal function of the brain that allows us to organize and manage the overload of stimuli that constantly bombard us”.

In testing, we also have categories that help us learn and understand the types of tests, such as functional vs non-functional. However, white vs black box categorisation is, in my honest opinion, inaccurate and inadequate. It is also the one that’s being more widely used, particularly for beginners. I believe we should stop using these terms, and in this article I will give my reasoning and a better alternative.

Note: For clarity purposes I will still use these terms until the end of the article.

Promoting stereotypes

You might think that white box tests are not better or worse than black box tests and therefore, it is OK to use those names. However, the idea behind it is that with white box testing you can see the code whilst you can’t with black box testing. With this categorisation we are promoting the stereotype that white is something clear, usually associated with purity, while black is opaque, usually associated with immorality.

If you really think about it, you would agree that this definitions make no sense.

Why can “you see the code” with white box test? You shouldn’t be able to see the contents of a white box at all. Consider this: most ceilings are painted white, can you see the sky or your neighbors throughout your ceilings? How can we normalize the fact that a white box is somehow see-through?

Here are three boxes with three different colours. I still can’t see inside any of them, can you? (Image credit)

Unit tests category chameleon

If I were to ask you in which category unit tests fall into, I bet most of you would say white box, because “you can see the code.” However, if you think in terms of TDD, you would write the tests BEFORE writing the code and therefore you are not seeing it, which by definition shouldn’t technically be white box.

What’s the actual difference between unit tests, positioned at the base of the testing pyramid, and the other tests?

The person who writes them writes also the code? But that could be said of any of the others, especially when shifting left.

The biggest difference is that these look for code structure rather than feature logic. For example, you don’t care about the number of products shown in the shopping cart, but that this particular piece of code can handle the right exceptions and have the right outputs.

Switching to clear vs opaque

In my opinion, just replacing the wording here, although it’s a step, it would not be enough. Whilst it is true that it would be a fast solution to avoid prolonging stereotypes and keep our systems and classifications intact and ourselves a bit more comfortable, I feel the stereotype would somehow still hold: people will first think white vs black and then change to these other names.

Furthermore, I still have more issues with this categorisation, which I’m going to continue arguing before giving what I think it’s the best alternative.

Grey box testing

You can tell when a category doesn’t make sense when it needs a lot of exceptions and an extra group to gather anything that can’t be placed in the others. Introducing grey box testing: when you… partially? sometimes?… see the code.

Grey box testing at the left… (Image credit)

As before, grey is another colour, if you use it in a box, you won’t see it’s interior. Not even partially. What should we use instead? Scratched box? Partially opaque box? It does not quite make sense. Which tests exactly would land in this category?

Test/QA people rarely work with white box

There are a lot of concepts behind white box testing that are really interesting but are usually incorporated in automatic code checking tools and rarely used daily by tests/QA people.

Cyclomatic complexity is rarely calculated manually, although sometimes it’s called out in code reviews. Most times this is part of code optimisation or happens behind other methods.

Statement testing, edge testing, branch testing, loop testing, Branch and Relational Operator testing and data flow testing are methods that are rarely planned, examined and automated in a frequent manner. Many of these terms are still unknown or unused by others on our industry today.

Most of the rest of the test pyramid have to do with what’s currently called black (or grey) box testing.

Structural vs logical

I think a clearer classification could be structural vs logical. In fact, in order to document myself for this article, I have been recently reviewing my notes from university and in them, it is said that these are alternative names these test categories are known for. Yet people rarely refer to them. Why not using them, when they are more suitable and less damaging?

The definition would be whether you are testing the structure of the code (number of branches, proper exception handling…) or the logic behind the feature (the shopping cart should have 0 products after a purchase).

Here is a tree: we can classify it by number of branches or colour of the trunk, but also by its function: does it give fruit or has flowers? Are they the right ones? (Image credit)

What about back-end testing / integration testing or database testing? Exactly the same: if you are testing the database for structure, then that would be structural, if you are testing it for logic, logical. We can reach more granularity with this definition, although we usually tend to test these for logic rather than structure.

This classification opens up to think about more tools to validate code. It also leaves unit test as a structural test with no room for confusion or need for extra categories.

Conclusion

I hope you understand that my point in this article is not to remove a category, but to enclose it in more appropriate terms and definition. If leaders such as Github have started a major shift in their branches to remove old coding terms, why can we not do this when we already have more appropriate names?

I really hope this post get to places such as ISTQB® and we start changing our terminology and explanations, as so many people look to them as the standard in the testing area.

As Miguel Ruiz said in “The four agreements” you should “be impeccable with your word”. Wouldn’t it be a great start of the year if we apply it to this test categorisation, using structural and logical instead of white box and black box?

Do you agree? Disagree? Let me know in the comments below. I could tell you more about structural testing and give you ideas to improve your coding with it, but that’s… well…another story.