Premature Passes: Why You Might Be Getting Green on Red

Red, green, refactor. The first step in the test-driven development (TDD) cycle is to ensure that your newly-written test fails before you try to write the code to make it pass. But why expend the effort and waste the time to run the tests? If you're following TDD, you write each new test for code that doesn't yet exist, and so it shouldn't pass.

But reality says it will happen--you will undoubtedly get a green bar when you expect a red bar from time to time. (We call this occurrence a premature pass.) Understanding one of the many reasons why you got a premature pass might help save you precious time.
  • Running the wrong tests. This smack-your-forehead event occurs when you think you were including your new test in the run, but were not, for one of myriad reasons. Maybe you forgot to compile it, link in the new test, ran the wrong suite, disabled the new test, filtered it out, or coded it improperly so that the tool didn't recognize it as a legitimate test. Suggestion: Always know your current test count, and ensure that your new test causes it to increment.
  • Testing the wrong code. You might have a premature pass for some of the same reasons as "running the wrong tests," such as failure to compile (in which case the "wrong code" that you're running is the last compiled version). Perhaps the build failed and you thought it passed, or your classpath is picking up a different version. More insidiously, if you're mucking with test doubles, your test might not be exercising the class implementation that you think it is (polymorphism can be a tricky beast). Suggestion: Throw an exception as the first line of code you think you're hitting, and re-run the tests.
  • Unfortunate test specification. Sometimes you mistakenly assert the wrong thing, and it happens to match what the system currently does. I recently coded an assertTrue where I meant assertFalse, and spent a few minutes scratching my head when the test passed. Suggestion: Re-read (or have someone else read) your test to ensure it specifies the proper behavior.
  • Invalid assumptions about the system. If you get a premature pass, you know your test is recognized and it's exercising the right code, and you've re-read the test... perhaps the behavior already exists in the system. Your test assumed that the behavior wasn't in the system, and following the process of TDD proved your assumption wrong. Suggestion: Stop and analyze your system, perhaps adding characterization tests, to fully understand how it behaves.
  • Suboptimal test order. As you are test-driving a solution, you're attempting to take the smallest possible incremental steps to grow behavior. Sometimes you'll choose a less-than-optimal sequence. You subsequently get a premature pass because the prior implementation unavoidably grew out a more robust solution than desired. Suggestions: Consider starting over and seeking a different sequence with smaller increments. Try to apply Uncle Bob's Transformation Priority Premise (TPP).
  • Linked production code. If you are attempting to devise an API to be consumed by multiple clients, you'll often introduce convenience methods such as isEmpty (which inquires about the size to determine its answer). These convenience methods necessarily duplicate code. If you try to assert against isEmpty every time you assert against size, you'll get premature passes. Suggestions: Create tests that document the link from the convenience method to the core functionality, demonstrating them. Or combine the related assertions into a single custom assertion (or helper method).
  • Overcoding. A different form of "invalid assumptions about the system," you overcode when you supply more of an implementation than necessary while test-driving. This is a hard lesson of TDD--to supply no more code or data structure than necessary when getting a test to pass. Suggestion: Hard lessons are best learned with dramatic solutions. Discard your bloated solution and try again. It'll be better, we promise.
  • Testing for confidence. On occasion, you'll know when you think a test will generate a premature pass. There's nothing wrong with writing a couple additional tests: "I wonder if it works for this edge case," particularly if those tests give you confidence, but technically you have stepped outside the realm of TDD and moved into the realm of TAD (test-after development). Suggestions: Don't hesitate to write more tests to give you confidence, but you should generally have a good idea of whether they will pass or fail before you run them.
Two key things to remember:
  • Never skip running the tests to ensure you get a red bar.
  • Pause and think any time you get a premature pass.

Simplify Design With Zero, One, Many

Programmers have to consider cardinality in data. For instance, a simple mailing list program may need to deal with people having multiple addresses, or multiple people at the same address. Likewise, we may have a number of alternative implementations of an algorithm. Perhaps the system can send an email, or fax a pdf, or send paper mail, or SMS, or MMS, or post a Facebook message. It's all the same business, just different delivery means.

Non-programmers don't always understand the significance of these numbers:

Analyst: "Customers rarely use that feature, so it shouldn't be hard to code."

Program features are rather existential--they either have to be written or they don't.  "Simplicity" is largely a matter of how few decisions the code has to make, and not how often it is executed.

The Rule of Zero: No Superfluous Parts
We have no unneeded or superfluous constructs in our system.
  • Building to immediate, current needs keeps our options open for future work. If we need some bit of code later, we can build it later with better tests and more immediate value. 
  • Likewise, if we no longer need a component or method, we should delete it now. Don't worry, you can retrieve anything you delete from version control or even rewrite it (often faster and better than before).

The Rule of One:  Occam's Razor Applied To Software
If we only need one right now, we code as if one is all there will ever be.

  • We've learned (the hard way!) that code needs to be unique. That part of the rule is obvious, but sometimes we don't apply "so far" to the rule. Thinking that you might need more than one in a week, tomorrow, or even in an hour isn't enough reason to complicate the solution. If we have a single method of payment today, but we might have many in the future, we still want to treat the system as if there were only going to be one.
  • Future-proofing done now (see the "options" card) gets in the way of simply making the code work. The primary goal is to have working code immediately. 
  • When we had originally written code with multiple classes and we later eliminate all but one, we can often simplify the code by removing the scaffolding that made "many" possible. This leaves us with No Superfluous Parts, which makes code simple again.

The Rule of Many: In For a Penny, In For a Pound
Code is simpler when we write it to a general case, not as a large collection of special cases.

  • A list or array may be a better choice than a pile of individual variables--provided the items are treated uniformly. Consider "point0, point1, point2." Exactly three variables, hard-coded into a series of names with number sequences. If they had different meanings, they would likely have been given different names (for instance, X, Y, and Z).  What is the clear advantage of saying 'point0' instead of point[0]? 
  • It's usually easier to code for "many" than a fixed non-zero number. For example, a business rule requiring there are exactly three items is easily managed by checking the length of the array, and not so easily managed by coding three discrete conditionals. Iterating over an initialized collection also eliminates the need to do null checking when it contains no elements.
  • Non-zero numbers greater than one tend to be policy decisions, and likely to change over time.
  • When several possible algorithms exist to calculate a result we might be tempted to use a type flag and a case statement, but if we find a way to treat implementations uniformly we can code for "many" instead of "five." This helps us recognize and implement useful abstractions, perhaps letting us replace case statements with polymorphism
Naturally, these aren't the only simple rules you will ever need. But simple, evolutionary design is well supported by the ZOM rules regardless of programming language, development methodology, or domain.

The "Flash a Friend" Contest: A Covert Agile Give-Away!

If you're reading this blog, you're probably a believer that a good agile process can make a difference. And maybe you've recognized someone on your team, on another team, or even in a different company that you think would benefit from a little covert mentoring.

We'd like to help! We believe getting these cards in the hands of the right people can make a real difference. We're willing to put that belief in action.

Here's how it works:
    Cover Image For Agile in a Flash...
  • Email us at, recommending one person who you think should receive a free deck. You don't have to name names, you can say "my boss," "our architect," "my dog," "my cousin," etc. You can even name yourself!
  • Tell us in one short, pithy line why you think that this person/team would benefit from Agile in a Flash. 
  • We'll read the comments and pick our favorites.
  • If your entry is selected, we will contact you and get the particulars (names, addresses).
  • The person you recommended gets a deck of Agile in a Flash from us. No note, no card, no explanation.  
  • To thank you for being so helpful, we send a second deck to you!
  • We'll put the winning comments on a soon-to-be-pubished Agile in a Flash blog entry. (You can choose to be attributed or anonymous.)
Deadline for entries: Friday June 15, 1200 MDT

Seven Steps to Great Unit Test Names

You can find many good blog posts on what to name your tests. We present instead an appropriate strategy for when and how to think about test naming.
  1. Don't sweat the initial name. A bit of thought about what you're testing is essential, but don't expend much time on the name yet. Type in a name, quickly. Use AAA or Given-When-Then to help derive one. It might be terrible--we've named tests "DoesSomething" before we knew exactly what they needed to accomplish. We've also written extensively long test names to capture a spewn-out train of thought. No worries--you'll revisit the name soon enough.
  2. Write the test. As you design the test, you'll figure out precisely what the test needs to do. You pretty much have to, otherwise you aren't getting past this step! :-) When the test fails, look at the combination of the fixture name, test method name, and assertion message. These three should (eventually) uniquely and clearly describe the intent of the test. Make any obvious corrections, like removing redundancy or improving the assertion message. Don't agonize about the name yet; it's still early in the process.
  3. Get it to pass. Focus on simply getting the test to pass. This is not the time to worry about the test name. If you have to wait any significant time for your test run, start thinking about a more appropriate name for the test (see step 4).
  4. Rename based on content. Once a test works, you must revisit its name. Re-read the test. Now that you know what it does, you should find it much easier to come up with a concise name. If you had an overly verbose test name, you should be able to eliminate some noise words by using more abstract or simpler terms. You may need to look at other tests or talk to someone to make sure you're using appropriate terms from the domain language.
  5. Rename based on a holistic fixture view. In Eclipse, for example, you can do a ctrl-O to bring up an outline view showing the names for all related tests. However you review all the test names, make sure your new test's name is consistent with the others. The test is a member of a collection, so consider the collection as a system of names.
  6. Rename and reorganize other tests as appropriate. Often you'll question the names of the other tests. Take a few moments to improve them, with particular focus given to the impact of the new test's name. You might also recognize the need to split the current fixture into multiple fixtures.
  7. Reconsider the name with each revisit. Unit tests can act as great living documentation -- but only if intentionally written as such. Try to use the tests as your first and best understanding of how a class behaves. The first thing you should do when challenged with a code change is read the related tests. The second thing you should do is rename any unclear test names.
The test names you choose may seem wonderful and clear to you, but you know what you intended when you wrote them. They might not be nearly as meaningful to someone who wasn't involved with the initial test-writing effort. Make sure you have some form of review to vet the test names. An uninvolved developer should be able to understand the test as a stand-alone artifact - not having to consult with the test's author (you). If pair programming, it's still wise to get a third set of eyes on the test names before integrating.

Unit tests require a significant investment of effort, but renaming a test is cheap and safe. Don’t resist incrementally driving toward the best name possible. Continuous renaming of tests is an easy way of helping ensure that your investment will return appropriate value.

Is Your Unit Test Isolated?

(Kudos to the great software guru Jeff Foxworthy for the card phrasing.)

An effective unit test should follow the FIRST prescriptions in order to verify a small piece of code logic (aka “unit”). But what exactly does it mean for a unit test to be I for Isolated? Simply put, an isolated test has only a single reason to fail.

If you see these symptoms, you may have an isolation problem:

Can't run concurrently with any other. If your test can’t run at the same time as another, then they share a runtime environment. This occurs most often when your test uses global, static, or external data.

A quick fix: Find code that uses shared data and extract it to a function that can replaced with a test double. In some cases, doing so might be a stopgap measure suggesting the need for redesign.

Relies on any other test in any way. Should you reuse the context created by another test? For example, your unit test could assume a first test added an object into the system (a “generous leftover”). Creating test inter-dependencies is a recipe for massive headaches, however. Failing tests will trigger wasteful efforts to track down the problem source. Your time to understand what’s going on in any given test will also increase.

Unit tests should assume a clean slate and re-create their own context, never depending on an order of execution. Common context creation can be factored to setup or a helper method (which can then be more easily test-doubled if necessary). You might use your test framework's randomizer mode (e.g. googletest’s --gtest_shuffle) to pinpoint tests that either deliberately or accidentally depend on leftovers.

You might counter that having to re-execute the common setup twice is wasteful, and will slow your test run. Our independent unit tests are ultra-fast, however, and so this is never a real problem. See the next bullet.

Relies on any external service. Your test may rely upon a database, a web service, a shared file system, a hardware component, or a human being who is expected to operate a simulator or UI element. Of these, the reliance on a human is the most troublesome.

SSDD (same solution different day): Extract methods that interact with the external system, perhaps into a new class, and mock it.

Requires a special environment. “It worked on my machine!” A Local Hero arises when you write tests for a specific environment, and is a sub-case of Relies on any external service. Usually you uncover a Local Hero the first time you commit your code and it fails during the CI build or on your neighbor’s dev box.

The problem is often a file or system setting, but you can also create problems with local configuration or database schema changes. Once the problem arises, it’s usually not too hard to diagnose on the machine where the test fails.

There are two basic mitigation strategies:
  1. Check in more often, which might help surface the problem sooner
  2. Periodically wipe out and reinstall (“pave”) your development environment

Can’t tell you why it fails. A fragile test has several ways it might fail, in which case it is hard to make it produce a meaningful error message. Good tests are highly communicative and terse. By looking at the name of the test class, the name of the method, and the test output, you should know what the problem is:
CSVFileHandling.ShouldToleratedEmbeddedQuotes -
   Expected "Isn't that grand" but result was "Isn"

You shouldn't normally need to dig through setup code, or worse, production code, to determine why your test failed.

The more of the SUT exercised by your test, the more reasons that code can fail and the harder it is to craft a meaningful message. Try focusing your test on a smaller part of the system. Ask yourself “what am I really trying to test here?”

Your test might be failing because it made a bad assumption. A precondition assertion might be prudent if you are at all uncertain of your test’s current context.

Mocks indirect collaborators. If you are testing public behavior exposed by object A, and object A interacts with collaborator B, you should only be defining test doubles for B. If the tests for A involve stubbing of B’s collaborators, however, you’re entering into mock hell.

Mocks violate encapsulation in a sense, potentially creating tight coupling with implementation details. Implementation detail changes for B shouldn’t break your tests, but they will if your test involves test doubles for B’s collaborators.

Your unit test should require few test doubles and very little preliminary setup. If setup becomes elaborate or fragile, it’s a sign you should split your code into smaller testable units. For a small testable unit, zero or one test doubles should suffice.

In summary, unit tests--which we get most effectively by practicing TDD--are easier to write and maintain the more they are isolated.