Criteria to assess the usefulness of a unit test

I deleted all my photos

From the keynote at XpDay London: Mark Striebeck, engineering manager at Google where he is responsible for developer testing infrastructure, tools and adoption.

Here are some criteria to assess the usefulness of a unit test
  • How often the test fails? When the test never fails there are chances that it is not testing anything that need tests.

  • Has the test been marked as "ignore" to release the system? So it's failure is not a real bug.

  • When the test is red, what changes are needed usually to fix it?
    • add or change a feature => ok
    • change the test code => ko, a false alarm
    • other changes => ko, a false alarm
It never fails or it is often ignored or it raises false alarms most of the times? This means that it is not giving back very much value. My advice: come on, just delete it !!!

And I add this advice, don't forget that the primary benefit of unit test and TDD is to make emerge a good design. Update: I learned that is perfectly acceptable TDD can be used to focus also on specifications and verifications, looks TDD Continuum.

Update: more notes on this keynote here: Improving testing practices at Google
Update: the blog of Google about Testing:

Print | posted @ Saturday, December 12, 2009 2:25 PM

Comments on this entry:

Gravatar # re: Criteria to assess the usefulness of a unit test
by Gil Zilberfeld at 12/16/2009 2:24 PM

Hi Luca,

In our latest installment of our webcast This week in testing, we talked about your post (actually on your description of Mark’s presentation). I invite you to check it out, comment and if you really like what you see, and want to see more, spread the word.

Gil Zilberfeld, Typemock
Gravatar # re: Criteria to assess the usefulness of a unit test
by Rogério Liesenfeld at 12/17/2009 5:49 PM

Even a test that "never" fails is useful, as it contributes to increased code coverage, and provides regression testing. If you remove the test, some maintenance developer may introduce a bug in the future and nobody will ever notice until users complain.

For me, the primary benefit of writing developer tests is testing, not design. Instead, good design is the primary benefit of refactoring (a crucial step in TDD).
Gravatar # re: Criteria to assess the usefulness of a unit test
by Luca Minudel at 12/18/2009 4:45 AM

Hi Rogerio,

code coverage is not a valid criteria to measure the quality/usefulness of a test. Indeed code coverage can only expose code that hasn't been adequately tested.
You can read about this here Don't be fooled by the coverage report, IBM. You can also read about the limitations of "line coverage", the one measured by almost all tools, compared to "branch coverage".

Tim Mackinnon, the one who invented the mockist-TDD says here Mock Roles, Not Objects that TDD most relevant benefit is about design.
And says also that focusing on TDD tends to produce simpler code because it focuses on immediate requirements rather than future-proofing.
Gravatar # re: Criteria to assess the usefulness of a unit test
by Luca Minudel at 12/22/2009 7:49 PM

Hi Gil, it depend.

When a developer or a team begin for the first time with TDD, the coach responsibility is to ask for tests, simplify problems, remove obstacles and relax time constraints.
The right message at this point is "do write unit tests".

After a while, when the developer or the team have written a good number of tests, it is not uncommon that in the test suite some of the tests are slow, or intermittently fails or are incomprehensible or meaningless.
At this point the right message is "bad tests must be fixed or deleted when it is most convenient to do": the developer or the team will learn what is bad and what is convenient.
A more detailed explanation follow.

The coach responsibility now is to guide the developer or the team to reflect about what are good unit tests, to reflect about what are the most useful tests to write and about when it is convenient to fix a bad test or when it is more convenient to delete it and spends the saved time to write a new good test. Time-wise decisions are essential when working on a legacy code-base that does not have yet unit tests coverage, and time-wise decisions are important in general because the sprint is time-boxed.
This is the right time to review the tests that are marked with "Ignore" before a release, that fails often for reasons other than bugs or that never fails. The latter are not the most frequent but they are good to highlight the practice of reviewing tests and assessing unit test value. The review of a test that never failed for a long time can reveal that the test is testing a feature that has been deleted (e.g. the deleted feature is the upper limit to deposit amount and the test is Check_that_deposit_under_the_upper_limit_doesnot_requires_approval), or it is not testing anything really (e.g. because the Assert just test the return value set for a mocked method just 3 instructions before), or it does not test the boundaries so it could pass even when there is a bug, or it has been written after the implementation (not in TDD) and that can explain why it test irrelevant details. When the review of a never failing test does not reveal a clear answer not even to an experienced TDDer it should not be deleted, is should stay there and the dev/team should wait for more evidences. Just like in the "2001: A Space Odyssey" film when HAL 9000 make a prediction of failure in 72 hours of AE-35 unit not sustained by actual evidences and the Earth-based ground control suggests to wait for the unit to fail. Instead when the result of the review suggest that deleting the test is more convenient then fixing it, the are no reason to be afraid or sorry: take courage in both hands and shoot!

Your comment:

Italic Underline Blockquote Hyperlink
Please add 3 and 4 and type the answer here: