Code Coverage – What is it good for?

Absolutely nothing!

Say it again!

That is certainly the response I got when I asked an architect about code coverage. And it also seemed to be the conclusion from a long email thread/discussion we had here last year. And it was the response I expected.
The referral to Martin Fowler’s blog post on Test Coverage was the most useful. To some extent this blog confirmed what I thought Code Coverage was good for, which is that it indicates where the untested code lives. What code coverage is not good for is setting a target that can then be happily ‘gamed’ to produce the number everyone first thought of (N.B. a handy skill if you are an accountant).

Why is code coverage a good-for-nothing?

The reason it is seen as a good for nothing that adds no value to a team is because historically it has represented a number that will be compared to one arbitrarily chosen by management. The expectation being that all anyone is interested in is making sure that the actual code coverage is higher than the limit number. Management will impose sanctions if this is not the case, and developers will be encouraged to write reams of meaningless tests in order to ‘game’ the code coverage number produced by their code to be the right side of the limit. See Why 100% code coverage is not enough for an example of this in action.

TDD Discipline results in good Code Coverage

I had also recently been on a TDD course, and the most important thing about TDD that I learned from that course is that it is a discipline. Discipline is the most important thing because without it TDD will not work. Writing tests for your code is not TDD; having 100% code coverage is not TDD.

Writing your tests before you write your code, and only writing the code you need to to pass your tests is TDD, if you do that (and no more) every time.

It is very difficult to maintain this discipline as a developer, because as soon as you start writing some code then you start adding extra functions or variables that you ‘know’ you will need later. As soon as you start doing that then you have broken the discipline of TDD. A side effect of this is that your code coverage will be less than 100%, but that is not a problem. The problem is that you have untested code that you don’t know isn’t tested, and that you don’t need.

The Test-driven Development Maturity Model can help you to understand what level of TDD you are working at, and also highlights at the end what type of developers don’t need to do TDD.

Code coverage lets you know where the untested code is, and thus where the untested risk in your system is located. If the risk in those locations is low value, then this may be acceptable, however, if the risk in those locations is high value, then this is unacceptable. TDD is an extreme form of this risk detection strategy, in that code is only written if it will be used, and tested. You only write code that has a failing test, and you only write tests for required business features.

Discipline vs. focus on critical areas

TDD is a discipline. Our trainer said he could get test coverage on any code within a couple of hours. He also said he’d expect it to be around 80% if the TDD was being done right. Not at least 78.457%, but around 80%. I.e. not chasing a particular number, but knowing it’s likely to be around 80% if the TDD is being done right. That is because TDD is premised on writing tests first as a discipline, so there should be a high coverage as a by product of the process.

In comparison BDD will not have the same by product, as it is premised on describing behaviours, not implementing tests first. Dan North at QCon (Accelerating Agile: hyper-performing without the hype) said that his teams test coverage was more like 6-7%, but in the critical areas it was 150%. I.e. sufficient, but not unnecessary. There needs to be sufficient testing in all areas, but to add the most value, no more than that, and in particular areas that may be very little indeed. At the same time, where there is critical functionality then this needs much higher coverage.

One example in our case might be that the core APIs are always critical, so will always need good coverage. Whereas when building an application this is built on top of the core APIs, so the coverage needs to be sufficient to cover the new features to an appropriate level, but not unnecessarily repeating coverage that already exists in the core libraries.

It is not really discipline vs. critical focus. They are two different processes that can each achieve good results. The point is to know which one you are doing and do it properly. Or whichever process your shop uses. In either case they know their code coverage and what it means to their process. The same should be true in any other process too.

So what is it good for?

Code coverage can inform you of the quality of your code. Interpreting and understanding that code coverage, and why it is as it is across your codebase will evolve over time, and will also involve an understanding of your domain and it’s requirements.

Code coverage at it’s most basic allows you to know whether you have tests or not! It can highlight where the testing effort has been targeted. Examining these results allows a team to take a measure of the healthiness of their code and their development practices.

Code Coverage enables safe Refactoring

Refactoring (Martin Fowler again), is a common and much used practice in BDD and TDD. The reason agile teams using these methodologies can refactor safely is because of the safety net that their extensive regression test packs give them. They know that if they break some previously working functionality then the automated regression tests running on CI will catch those failures. They know this because they wrote lots of unit tests and acceptance tests which give good code coverage. They know they give good coverage because they code reviewed and QAs checked those tests when they were first written.

Unfortunately even agile teams suffer from technical debt, and deadline pressure, which means maybe not all of those tests did get written. So, in fact, they may not have a safety net, and they don’t know where they need to focus their efforts to mend the holes in that net if they don’t run code coverage because they all agreed there was no need as it was just a management tool to produce a meaningless metric.