TDD Antipatterns

(font is brianne's hand)

James Carr enlisted a group of fellow travelers to define a list of TDD Antipatterns, errors in judgement common in TDD practice. He provided an initial list for us to base our work on, and did a very fine job of filtering out the duplicates and near-duplicates, providing catchy names, and writing up the result. Note that his list is longer than ours since we have a terseness constraint. Read the full list.

I wish I had thought of it first.

  • The Liar is a test that runs, but does not test what it claims to test. It could be named after a class, but actually be testing another. The test might be called ShouldNotThrowExceptionsForPositiveValues, but actually use the natural numbers less than 5. Liars give a false sense of security.
  • Excessive Setup is common when the architecture is badly coupled and mocking is not well-used. This is often evidence of insufficient pre-factoring. A little dependency injection and a little interface use can go a long way. This is also common when programmers give in to the urge to test software "in context." One hopes that when they get to the assert they'll remember what scenario they were testing.
  • The Giant is a single test that tests more than a single scenario. It may have excessive setup, but then follow with a large number of manipulations and assertions. It may test entire subsystems. Adding a new assertion requires a programmer to reverse-engineer his way through the Giant to find an assertion insertion point, and requries the programmer to exercise care to not leave side-effects that will cause the last half of the Giant to fail. Giants are ticklish and hard to understand.
  • The Mockery is a real piece of work. A pair/team/programmer has actually replaced the system under test with a test double. The test proves only that the mocks worked as expected.
  • Generous Leftovers are left behind by tests that dont clean up after themselves and cause later tests to fail when run in the suite but not when run in isolation. The leftovers are typicall in static memory, on disk, in a database (for shame!) or some other persistent store. When leftovers are found, it's often puzzling whose they are, and what they are.
  • Local Hero is a test that runs well, and tests a system well, but only passes on the author's machine or network. Being environmentally-sensitive, such tests fail for peers and CI systems. Typical failures involve hardcoded paths, locally-installed libraries, and OS-specific assumptions.
  • The Loudmouth is a test that produces copious output. It is like the boor at the party who thinks every trivial event in his live is worthy of an epic tale. At one time, the loudmouth's story may have been worth hearing, but now it's just idle chatter that gets in the way of a real conversation with more interesting guests/tests.
  • The Secret Catcher is a test that seems to do nothing at all, but is secretly (implicitly) depending on any errors to produce exceptions. The fact that the code executed without exception is expected to evidence that the code works. These tests can eventually be reverse-engineered by every programmer on the team. Failure of the test always results in code spelunking, an unpopular passtime.
  • The Hidden Dependency is a test that secretly requires setup that is not present in the test itself. Perhaps it requires a certain data setup script, or a change in a configuration file, or another test to have already run with generous leftovers. Hidden Dependency tests are a special kind of evil.
  • The Stranger is a perfectly good test in a perfectly wrong place. It is not a liar, it's just not testing the same system (SUT) that the other tests are testing. Strangers don't cause problem except when you're looking for tests for class X and it's hiding among the tests for class Y.
  • Success Against All Odds is a test that will always pass, no matter what. Due to a series of missteps, the test won't fail even if the code is wrong. Often these turn into complicated indirect versions of "assert true == true" or "false == false". These can be a variation on Mockery or The Liar. It is likely that a test with Success Against All Odds was written as a green test, without ever having seen the "red" part of the Red->Green->Refactor loop.
  • The Slow Poke is a test that takes too long to run. If you have 15000 tests, and you want them to run in less than 45 seconds, you have a budget of 0.003 seconds per test. A five-second test will cause some irritation. A few 10-second tests will make people think twice about running the tests at all. A test that takes a minute is unlikely to ever be run by a programmer. Imagine what hell awaits the author of some of the three-to-ten minute monstrosities that exist in the wild! In a TDD shop, slow pokes cannot be tolerated. Note that many slow pokes are also Giants with Excessive Setup. Be warned.
Tim recommends that readers follow the link to the original article, which covers more territory than the index card can allow.

What is missing?

We have dozens more cards we can write and post, so were not running short on ideas even though we can be short on time to get them all written up. Still, I wonder, if you could pick the next card for me, what would you want it to be about? Is there an area we've not addressed at all? Topics of interest to you that might be interesting to others? Guest authors we should contact? I'm all ears.

Pairing Workstation Configuration

(font is SD Marker still)

Teams beginning to use pair-programming often struggle because of poor workstation configuration as they attempt to use their individual programming space. There is more to it than merely adding a second chair. Take a moment to review Pair Programming Smells and recall that there are physical limitations as well as psychological limitations to overcome.

  • Chairs sit comfortably side-by-side so developers can have equal access. If either person is physically limited from grabbing the keyboard and mouse, then the setup is wrong. Pair programming is about sharing the editing of code together. It is necessary that both have equal access. Beware corners: if you have a monitor in desk/cubicle corner, then necessarily one person has better access than the other and pairing breaks down.
  • Add an extra monitor (or two!), preferably a nice, thin LCD or plasma screen. An extra monitor can be placed where the pair members can both see it equally well. A thin screen doesn't need a desk corner. Also, all of the annoying popups (mail, chat, etc) can be placed on the monitor where the code is not displayed, where it can be ignored. Finally, it is useful to run a countdown timer on one screen while programming on the other, to enable pomodoro-like techniques. Pairing is best done in time boxes with regular breaks.
  • Get some USB keyboard(s) for pairing. You should have a comfortable keyboard with a long cord. If one of you requires/prefers an ergonomic keyboard, then carrying around their favorite keyboard is a reasonable concession.
  • One mouse per keyboard is best. This is again about equal access for editing.
  • Get docking stations for notebook computers because shoving the computer back and forth is a pain, and because docking stations can give you extra USB ports and VGA ports. There really is not a lot of cost involved, and it really helps keep the pairing "fair".
  • Pens, scrap paper, and index cards should be in reach. You might be surprised how often you need to take a note now and come back as soon as the code is passing all the tests again. With two people, there are more ideas to choose from, and each learns tricks from the other. Writing things down allows each partner to process it when he is not pairing.
  • IDE/editor of choice with shared configuration is a contentious bit. We need to use the same tools if we are to have equal access for editing. In some teams, the emacs/vim/eclipse/scite wars will erupt, but it is better if the team chooses and all members learn to get by in the chosen environment. If they won't decide which editor to use, how are they going to make group decisions about design and architecture? It is better to learn use an "inferior" editor than to have to wrestle the editing environment from each other several times a session. When it comes to coding standards and standard editors, it is a good practice to "be a sport" and choose to make the team work better even if it will cost you a little in the short term.

  • Pair programming is not a utopian practice. Some people don't enjoy it at first, and some never warm up to it. Regardless, it makes the code better. If your team wants to make the code better through pair-programming, then it is important that their attempts are not hobbled by a poor workstation configuration. For a little bit of money, a leader can make a big difference in the way a company puts out software.

    SMART goals

    Font: Sterofidelic

    The best lists never die. The SMART mnemonic has been around for at least half a century, per Wikipedia. SMART is similar to INVEST, evaluating criteria for goals and objectives instead of for stories.

    SMART has generated many different word expansions over the years. Sometimes M is meaningful, manageable, or even motivational(!). Our Agile in a Flash card uses the supposedly preferred words. But as a result of this inconsistency, I struggled yesterday to recall the best choice for the letter R. "Realistic?" No, that's too close in meaning to attainable. A quick search revealed relevant (duh!). I supplanted my mild self-annoyance (at my inability to remember the better choice) with elation at the prospect for a new, relevant agile card!

    In the context of agile, I've found SMART goals to be useful when discussing action items to come out of retrospectives. True, there's no reason these couldn't be treated just like INVEST stories. But I've found that selling something half-a-century-tried-and-true can be a little easier with some crowds.

    Here are my thoughts about the relevance of the SMART criteria with respect to retrospectives.

    • Specific - Vague promises of improvement usually don't generate results. Think of the 5 W's: who, what, when, where, and why. Instead of "we'll try to get stories done earlier in the iteration," how about "the developers will deliver at least one story every two to three days to QA, who will complete testing of them within a day of delivery, so that we can ensure stories are 'done done' by iteration end." (And don't forget that "try" is a word you want to banish.)

    • Measurable - Attainment of our specific example goal might be validated by answering some questions: What was the average number of stories completed within two to three days? How many stories did not complete in this time? You can think of iterations as fixed time periods in which to run experiments; you can express a hypothesis that validates or disproves the value of each experiment by capturing relevant data.

    • Attainable - It's important that a team can check off completed goals, to reinforce the sense of achievement. Obviously goals that your team can complete in an iteration best meet this criterion, but you don't want to have only short-term goals. There's nothing wrong with long-term goals; just make sure there's a way to measure incremental progress.

    • Relevant - Too many trivial goals can give a bloated sense of achievement. Shortening daily stand-up meetings by limiting them to five minutes might seem beneficial, but does it really change anything? What's the real problem? Don't hesitate to attempt dramatic changes, and don't hesitate to think outside the box that pseudo-agile dogmatists might otherwise paint you in.

    • Time bound - Like stories, many teams tend to have a problem with letting things creep past iteration boundaries. "We just need a little more time." Set up the experiment, define completion and success criteria, and grade the experiment: it was either completed or abandoned, and the hypothesis either held true or was disproved.

    Get SMART today!

    Team Smells for Coaches

    Every agile coach will have a number of factors they monitor constantly. Many are barely aware of the "smells" they are looking for, but will have a constant awareness of issues that indicate that a transition is not progressing well. We provide a short list of common smells that indicate your agile project is not going as well as you hoped:

    • Heaping piles of unfinished work will appear when assigned work is not being finished as quickly as new work is being assigned. This is generally a problem when the velocity is not being respected. The Theory of Constraints tells us that we have to subordinate the business to the bottleneck, a simple idea which is very hard to sell. Work may pile up in development or QA, especially if it is also piled up in sales.

    • Individual work assignments are typical in non-agile shops, but agile developers team up and pair on production code. Managers may demand that individuals are assigned work prior to starting an iteration so that they may track the productivity of individuals. This is a disincentive for agile teams, because pairing with a colleague risks the work one is personally assigned. By pitting team members against each other in the fight for individual rankings, work assignments can destroy true productivity.

    • Back-channeling is occurring if certain interests outside the development team (Customer or outsider) go directly to programmers and give them individual work assignments. This prevents the programmer from applying his abilities to the completion of the iteration. It also damages true productivity as others are required to pick up the slack, and the targeted programmer is forced to task-switch unpredictably. Back-channeled communications are a complex flow with political complications that prevent a team from doing their best work. It is better to work with a simpler, transparent management and communication structure. Note that sometimes back-channeling is an attempt by a leader whose work has been de-prioritized to get it pushed through anyway in defiance of the management structure.

    • Blame-avoidance behaviors cause programmers to fear refactoring, redesign, and any important practice outside of a narrow job description. In fearful organizations, blame-avoidance prevents both productivity-wasting activities and productivity-enhancing activities. Fear is a reason to stay wrong, or at least to be right only in familiar ways.

    • The urge to matrix-manage the team is a result of a lack of trust in the team by outside forces. When pressure is on any team, the certain leaders may seek to control other teams. When managers from outside a team try to manage the team by force or fiat instead of through normal channels (planning, prioritizing, etc) then it is clear that there are unaddressed problems in the management chain. These problems need to be resolved before they break down the team.

    • Cargo-cult ceremonies are ceremonies carried out for the sake of the ceremony, rather than performed for good effect. Examples are planning ceremonies where estimates are not accepted and stories are not scoped to the iteration, scheduled retrospectives where no problems are really solved, iteration boundaries where work is carried over to the next iteration with full credit for "completion", etc. These are signs that the team is "seeming" agile, rather than "becoming" agile. It shows a lack of real change in values.

    • Test negligence is a particularly nasty sign of an team's lack of agility. It shows that work is not being done by testing, and any pairing that is occurring does not support the agile practice. Agile teams depend on copious automated tests to accelerate their production. If tests are not being written, or not being run, then the team is not going to expand its capacity to produce. It is common that this happens in conjunction with back-channeling and/or work piling up. Skipping testing is a time honored way to seem to cut corners while actually making the software harder and harder to modify.

    • Guarded speech is a sign that the real values and the spoken values of the team are not the same. It may be that the team is trying to be agile, but certain leaders or managers are not willing to hear it or else that the management is bought into Agile development and the developers are not buying it. It may be that there is some barrier to productivity that is being held in place by political or personal power of some leaders. Barriers to transparency tend to be political. It may be necessary to change the team's political situation so that they may work openly.

    Acceptance Tests

    Font: Complete In Him

    The word "acceptance" implies that we have a high level of confidence that we can ship quality product. The goal for acceptance tests is to provide self-verifying, self-documenting executing examples of how the system is intended to be used. In a rough sense, they are use cases taken to the next level.

    • Define “done done” for stories. The acceptance tests for a story provide a contract for completion. As programmers, we know we are truly done when the code passes all of the acceptance tests. As consumers of the system, we know we can accept it when all of its tests pass. Progress in an iteration is often best tracked by the number of acceptance tests passing.

    • Are automated. Our Automating Tasks card provides detailed recommendations on when and when not to automate. Most importantly, this directive doesn't preclude the existence of tests that aren't automated!

    • Document all uses of the system. Think about the tests as examples that demonstrate valid uses of the system. This mindset will help you develop tests that can be read and understood by people wanting to understand the system. Such documentation never becomes stale (although it requires good effort to design the tests to be so expressive).

    • Don't just cover "happy paths." Stories usually require a family of tests to comprehensively cover alternate cases and exceptional situations. Inevitably, you will ship a defect that an automated test would have prevented. This previously "missed" acceptance test now becomes part of the complete test suite.

    • Do not replace exploratory tests. We will always want some manual testing above and beyond the tests that define acceptance criteria for a new story. Exploratory testing highlights the more creative aspects of how a user might choose to interact with a new feature. It also helps teach testers how to improve their test design skills. Don't forget to knee test!

    • Run in a near-production environment. How many nickels have you earned for hearing the phrase "It worked fine on my machine!?" Acceptance tests must execute in an environment that emulates production as closely as possible. This means they hit a real database and external API calls, as much as possible. By definition, then, acceptance tests are slow.

      Still, minute to minute, I might want my suite of acceptance tests to run as fast as possible, as part of a continuous integration build. As a developer, I want rapid feedback, to provide high short-term confidence that I can move on. I know we'll still run the full suite in a proper environment overnight; having this fallback allows me to look at tactics such as using an in-memory database.

    • Are defined by the customer. Hopefully we know who the customer is. Agile acceptance tests are an expression of customer need (aka requirements). While all parties can contribute to statements of requirements, it's important that a "single customer voice" defines their interests as an unambiguous set of tests. (This also means that it's perfectly ok for a programmer to help with testing--they just can't be the one defining the tests.)

    I'm often asked, "what about regression tests?" Most people define regression tests as "tests that make sure we didn't break something already in the system." My answer is that they imagine we had built this entire system in an agile, acceptance-test-driven fashion, i.e. where we ship only code that passes pre-defined acceptance tests. In this world, regression testing can thus happen every iteration--we simply run the entire suite of automated tests for all stories delivered to date.