Criteria to assess the usefulness of a unit test

I deleted all my photos


From the keynote at XpDay London: Mark Striebeck, engineering manager at Google where he is responsible for developer testing infrastructure, tools and adoption.

Here are some criteria to assess the usefulness of a unit test
  • How often the test fails? When the test never fails there are chances that it is not testing anything that need tests.

  • Has the test been marked as "ignore" to release the system? So it's failure is not a real bug.

  • When the test is red, what changes are needed usually to fix it?
    • add or change a feature => ok
    • change the test code => ko, a false alarm
    • other changes => ko, a false alarm
It never fails or it is often ignored or it raises false alarms most of the times? This means that it is not giving back very much value. My advice: come on, just delete it !!!

And I add this advice, don't forget that the primary benefit of unit test and TDD is to make emerge a good design. Update: I learned that is perfectly acceptable TDD can be used to focus also on specifications and verifications, looks TDD Continuum.


Update: more notes on this keynote here: Improving testing practices at Google
Update: the blog of Google about Testing: http://googletesting.blogspot.com/

Print | posted @ sabato 12 dicembre 2009 14.25

Comments on this entry:

Gravatar # re: Criteria to assess the usefulness of a unit test
by Gil Zilberfeld at 16/12/2009 14.24

Hi Luca,

In our latest installment of our webcast This week in testing, we talked about your post (actually on your description of Mark’s presentation). I invite you to check it out, comment and if you really like what you see, and want to see more, spread the word.

Gil Zilberfeld, Typemock
  
Gravatar # re: Criteria to assess the usefulness of a unit test
by Luca Minudel at 16/12/2009 22.52

Hi Gil, nice webcast.

The idea of TROI (Tests Return On Investment) reported by Mark Striebeck looks interesting and can help to advance unit testing more like a science than like a "religion".

Suppose that you spend 30 min to write unit test A and in one year unit test A fails 3 times because of bugs. So unit test A has a good TROI. The confidence given by test A is based on the evidence that it has been useful.
Suppose that unit test B require 30 min too and in one year with changes to the feature, never fails. So the TROI of unit test B actually is zero and confidence given by test B now cannot be based on data/facts/evidences. That sense of confidence is a false sense of confidence now.

So unit test B deserves some investigation:
- is it useful anyway because it documents a requirement or instead is it even hard to read/understand?
- is it testing some code that some change can broke or is it testing something that very unlikely will fail (e.g. because the compiler or other tests already check that)? Or worse, the test cannot even fail because is wrong?

Deleting code that is unreachable or unused is as good as deleting tests that have zero TROI and have no use. They will remain in repo history and this is enough.

When we abandon the responsibility to point out one unit test that now have zero or very low value we are also losing the chance to show the high value of the other unit tests that is demonstrable by facts .
When we base our confidence on an unproven fact/evidence we are also relinquish to the critical approach that is fundamental to increase the TROI and improve the quality of the next unit test that we write.
  
Gravatar # re: Criteria to assess the usefulness of a unit test
by Rogério Liesenfeld at 17/12/2009 17.49

Even a test that "never" fails is useful, as it contributes to increased code coverage, and provides regression testing. If you remove the test, some maintenance developer may introduce a bug in the future and nobody will ever notice until users complain.

For me, the primary benefit of writing developer tests is testing, not design. Instead, good design is the primary benefit of refactoring (a crucial step in TDD).
  
Gravatar # re: Criteria to assess the usefulness of a unit test
by Luca Minudel at 18/12/2009 4.45

Hi Rogerio,

code coverage is not a valid criteria to measure the quality/usefulness of a test. Indeed code coverage can only expose code that hasn't been adequately tested.
You can read about this here Don't be fooled by the coverage report, IBM. You can also read about the limitations of "line coverage", the one measured by almost all tools, compared to "branch coverage".

Tim Mackinnon, the one who invented the mockist-TDD says here Mock Roles, Not Objects that TDD most relevant benefit is about design.
And says also that focusing on TDD tends to produce simpler code because it focuses on immediate requirements rather than future-proofing.
  
Gravatar # re: Criteria to assess the usefulness of a unit test
by Gil Zilberfeld at 20/12/2009 15.22

Hi Luca,

It's an interesting point of view - TROI. In a way, writing a test (disregarding the thinking, design and API usage benefits), for testing reasons only, is a gamble.
It may prove useful (by failing somewhere in the future) or not (which is the exact opposite, and parallel of not writing a test). I'm not sure you can actually separate the value from the other values, but let's say you can - do you feel it's sending a wrong message to developers?

Gil Zilberfeld
Typemock
  
Gravatar # re: Criteria to assess the usefulness of a unit test
by Luca Minudel at 22/12/2009 19.49

Hi Gil, it depend.

When a developer or a team begin for the first time with TDD, the coach responsibility is to ask for tests, simplify problems, remove obstacles and relax time constraints.
The right message at this point is "do write unit tests".

After a while, when the developer or the team have written a good number of tests, it is not uncommon that in the test suite some of the tests are slow, or intermittently fails or are incomprehensible or meaningless.
At this point the right message is "bad tests must be fixed or deleted when it is most convenient to do": the developer or the team will learn what is bad and what is convenient.
A more detailed explanation follow.


The coach responsibility now is to guide the developer or the team to reflect about what are good unit tests, to reflect about what are the most useful tests to write and about when it is convenient to fix a bad test or when it is more convenient to delete it and spends the saved time to write a new good test. Time-wise decisions are essential when working on a legacy code-base that does not have yet unit tests coverage, and time-wise decisions are important in general because the sprint is time-boxed.
This is the right time to review the tests that are marked with "Ignore" before a release, that fails often for reasons other than bugs or that never fails. The latter are not the most frequent but they are good to highlight the practice of reviewing tests and assessing unit test value. The review of a test that never failed for a long time can reveal that the test is testing a feature that has been deleted (e.g. the deleted feature is the upper limit to deposit amount and the test is Check_that_deposit_under_the_upper_limit_doesnot_requires_approval), or it is not testing anything really (e.g. because the Assert just test the return value set for a mocked method just 3 instructions before), or it does not test the boundaries so it could pass even when there is a bug, or it has been written after the implementation (not in TDD) and that can explain why it test irrelevant details. When the review of a never failing test does not reveal a clear answer not even to an experienced TDDer it should not be deleted, is should stay there and the dev/team should wait for more evidences. Just like in the "2001: A Space Odyssey" film when HAL 9000 make a prediction of failure in 72 hours of AE-35 unit not sustained by actual evidences and the Earth-based ground control suggests to wait for the unit to fail. Instead when the result of the review suggest that deleting the test is more convenient then fixing it, the are no reason to be afraid or sorry: take courage in both hands and shoot!
  

Your comment:

Title:
Name:
Email:
Website:
 
Italic Underline Blockquote Hyperlink
 
 
Please add 2 and 1 and type the answer here: