Writing Characterization Tests




Source: Feathers, Michael. Working Effectively With Legacy Code (WELC), Prentice Hall PTR, 2005.


Test-after development (TAD) is tough; dependencies and non-cohesive code can make for a significant challenge. That's why Michael Feathers' book Working Effectively With Legacy Code is so useful.


Does working with legacy code really require a different set of skills than doing TDD? Well, sprout method and sprout class are the first avenues you should always explore, and they are strictly about test-driving the new code. But most of the other WELC techniques require less test driving and more digging about, trying different things. Some of the techniques even border on being too clever--they're about problem solving, and you do what you gotta do!


An important step in modifying existing code is understanding what it does before you make any changes. Feathers says characterization tests are like putting a vise around code: You want to pin it down before you attempt changing it, so that you know if it slips a bit when you do. So in a sense, we are back to test-first.


How many characterization tests do you need to write? You can read the WELC chapter on effects analysis, and start to get scared that you're going to have to write a lot of tests. Or you can use confidence as a guideline, writing as many tests as you think you need to understand the code you're about to change.


The flash card describes what are pretty obvious steps, but they back one of the underlying themes in Feathers' book: be methodical, be safe.

Story Format



Well, it had to happen sooner or later: a card that bothers me quite a bit. Mike Cohn's book Agile Estimating and Planning, a book I recommend for just about anyone working on an agile project, popularized this story card format.


"As an actor, I want to accomplish some goal, for some reason"--very similar to what we did with use cases in the early 90s. And back then, we spilled time bickering over format and wording, so I give credit to whoever came up with a template to help sidestep those wasteful arguments.


The problem is that the story card ain't the thing, it's a placeholder and "promise for more communication," per Ron Jeffries. The card could have one word on it, and that would be sufficient. Bob Koss often told a story about a story for the U.S. Navy regarding how to adjust large gun firing based on sonic feedback. The story card said simply "boom splash!" That was enough to initiate talking about it, and no one forgot who the story was for, what it meant, or why it existed over the course of the iteration.


I realize I'm railing against a potentially useful flash card. As a tool, the format template can help us considerably in improving our collections of stories. Some same lessons from use cases apply: Thinking about the actors involved (the "as a" people) can help trigger the introduction of important stories that might be missed. The phrase "I want to" reinforces that stories are goals for customers the system, and not just technical pipe dreams. And "so that," well, as we write out dozens of stories in release planning, we'll want to remind ourselves of the rationale behind certain stories (but not all of them!). And sometimes this "why" can trigger other useful considerations.


To be fair to Cohn, he says many similar things in his book. But you know how people are. I've already encountered spreadsheets and other software tools that rigidly insist on this format. Never mind that the actor is the same for every last story, or that the reasons are pretty obvious for most stories, or that typing all that stuff is just redundant crud that we would stamp out if it were in our code.


Remember: The cards we present as part of the Agile In a Flash project are tools, not gospel. They're here to help prod you if you get stuck, to give you ideas, and to give you guidance. In most cases, you should follow them unless you have darn good reasons--and even then, you should follow them until you know why the rules exist, and only then should you consider taking your own path.

Breaking Down Larger Stories



Source: Jeff Langr, Tim Ottinger; also, Cohn, Michael. Agile Estimating and Planning.

One ideal for agile development would be to be able to deliver a "done done" story every day. We get a lot of resistance on this thought, and we push right back. Our goal is not to insist on the ideal, but instead to get you to move toward that goal, and not to defend stories that we think are too large. "Too large?" There's no consensus. Anything taking over a half iteration undoubtedly should be scrutinized.


One way to deliver stories more frequently is collaborate a bit more. Instead of one developer working a large story alone (sadly common), consider adding another developer or two or three. If they can help deliver the story sooner without inordinate overhead costs (including coordination efforts to avoid stepping on each other's toes), go for it.


The other route is to split stories. Breaking down larger stories is an often challenging proposition, but the more you do it, the easier it gets. Next time you have a large story, step through the list on the card.


The most challenging stories are "iceberg" stories. These are stories where the customer sees only a small impact, but the algorithmic or data complexity required to implement the story is large.


Sometimes a split isn't worth it. For example, you look to consider an alternate case as a separate story, but the developer says, "well, it's only going to take me ten minutes to implement that."


A key thing to think about when looking to split stories is the tests.


  • Defer alternate paths / edge cases - If anything, thinking about how to split stories around alternate cases will force you to make sure you've captured all of them!

  • Defer introducing supporting fields - If the story involves user input of good-sized chunks of data, support a few key and/or significant fields and introduce the remainder later.

  • Defer validation of input data - Demonstrate that you can capture the information; add the ability to prevent the system from accepting invalid data later.

  • Defer generation of side effects - For example, creating a downstream feed when the user updates information.

  • Stub other dependencies - "Fake it until you make it."

  • Split along operational boundaries (e.g. CRUD) - This is directly from Cohn's book. Note that you don't necessarily have to have "C" (Create) done before you implement "U" (Update).

  • Defer cross-cutting concerns (logging, error handling) - This generates an interesting challenge around an important point--devising acceptance tests that verify the addition of robustness concerns.

  • Defer performance or other non-functional constraints - Sometimes it's possible to bang out an implementation using an overly simplistic algorithm. A similar challenge: How do you devise acceptance criteria for the performance improvement?

  • Verify against audit trail that demonstrates progress - I'm not enamored with this one, but sometimes there are no other obvious solutions. Sometimes this will expose more implementation details than you would like. Perhaps these tests can disappear once the entire story gets completed.

  • Defer variant data cases - Will it simplify the logic if we don't have to worry about special data cases? Or more complex data variants? For example, it's probably a lot easier to devise a delivery scheduling system that supports only one destination.

  • Inject dummy data - In some cases, data availability or volume can be a barrier to full implementation of a story.

  • Ask the customer! - You may be pleasantly surprised!



Well, this list isn't necessarily complete, and needs a bit of work. What other story splitting mechanisms have you used successfully?

Refactoring



Refactoring is a large topic with enough material to fill a few books. In fact, it has filled a few good books. Clearly we don't have room for all of that on an index card but at least we can provide some meta-guidance on the topic.

Done only with a green bar (all tests passing) never with failing tests. No matter how much you want to rename, extract, or modify the code you really should have green (even an ugly green) before you try to refactor.

Always run the tests before and after the refactoring steps. Make sure you know that the system worked before and after each step. Check before, so you aren't confused about origin of breakage, and after so that you know you've done a good job. Skip this at the peril of having to use debugging or having to revert a set of changes. Always have a single reason to fail.

Refactor incrementally: have only one reason to fail. This cannot be over-recommended. Michael Feathers describes programming in his book as "the art of doing one thing at a time", which has been a very good bit of advice through the years. There is a good reason that the refactoring book gives recipes of tiny steps. This incrementalism is worthwhile and hard-won habit.

Add before removing when you are replacing a line. For instance, if there is a complex calculation and you are replacing it with a function that calculates the same result, place the new call & variable assignment right after the old calculation. Run the tests. If the result is different, comment out the call and try again. Don't delete the original until you know for sure that the new code is equivalent.

If you move a method to a new class, you must move or create a test. Otherwise your tests become increasingly indirect. Indirect tests do not produce a good smell.

Changing functionality is not refactoring. That would be correction or implementation. Calling a functional change a "refactoring" is not entirely honest. Refactoring does not change what the code does, it only changes where the code is located.

Increasing performance is a story, not refactoring. This is another misuse of the term "refactoring." The point of refactoring is to improve the structure of the code. While performance can accidentally result from simplifying the code, it is the simplification and not the performance that is the object.

The books outlined above will give a better idea of how to perform refactoring. At least the index card can help remember the circumstances in which one should perform the task.

Collective Code Ownership



See Extreme Programming's writeup on the rule of collective ownership.

Next to pairing, the Agile practice that causes consternation among developers is the idea of collective code ownership. It is not a difficult concept to grasp, but individual ownership is a difficult idea to let go. Developers inclined or acclimated to working solo find this four-bullet card quite threatening, and the motto above it positively chilling.

But what if we were truly working as a community? We would give up having the final say about how our code is formatted and how variables were named. We might find other people's pet peeves getting equal say. We might find that other people can take over the code we wrote. We might be able to be sick for a day or go on vacation without work ("that only we can do") piling up. We might be able to simplify the team's work by having a single work queue. We might find that we can help make other people's code better. We might find that they can improve ours. But we have to get over the idea that the code is ours.

It's not ours. It really belongs to the customer or company the second it rolls off our fingers. It won't follow us home, and we can't take it with us when we leave. It is copyrighted by the company as a work for hire. Let's not lose sight of that nugget of reality.

Anyone can modify any code at any time so that nobody has to seek permission to simplify and speed the code, or to add features that might touch code they don't own. Tests can be written because code can be made testable. This frees the development team from fear of a certain variety of personal conflict getting in the way of getting real work tested and delivered.

The team has a single style guide and coding standardnot because some arrogant so-and-so pushed it down their throat, but rather the team adopts a single style so that they can freely work on any part of the system. They don't have to obey personal standards when visiting "another person's" code (silly idea, that). They don't have to keep more than one group of editor settings. They don't have to argue over K&R bracing or ANSI, they don't have to worry about whether they need a prefix or suffix wart. They can just work. When silly issues are out of the way, more important ones take their place.

Original authorship is immaterial because "it's my code" has never been a valid reason to avoid improving or modifying a system that doesn't work or which doesn't do enough. In agile teams, we try not to talk about the original author when we are working on code. It simply is. It had an original intent, and we hope to strengthen it and make it more clear. The code exists and works. We want it to work better. This focus on the behavior of the code over the personality of the author is freeing as well. We can learn together once we let the guards down.

Plentiful automated tests increase confidence and safety so that we can learn while coding. If code is test-driven, every violation of the original intent (as expressed in the tests) is met with a test failure. Rather than pre-worrying every possible code change, the developer pair can make progress in short order and can refactor without fear. This is better for the company, better for the programming pair, better for the individual.

Reading over this card, I wonder if we missed the more important point that work does not have to queue for individuals. Too often we find work piled up for one or two siloed developers while two more are nearly idle because their silos are not busy. This is inevitable when our team structure involves multiple queues, and it is a shame. If we broke down the silos and followed pairing, testing, and collective ownership practices for just a little while we would find ourselves in a situation where any number of developers would be able to pick up the jobs that are waiting.

Examine a few common types of situations:

Scenario 1: Fred can produce automated filing reports in a day. Abe and Margaret together could get it done in two and a half days.. It makes sense to have Fred do this work, yes? But it's Wednesday, the work is due Friday, and Fred has 16 hours of more important work that is also queued for him. Now it seems foolish to assign Fred, because it will cause us to miss a release or else Fred will work copious overtime while Abe and Margaret go to the bar at 3:30pm. A good deal for Abe and Margaret, bad for Fred and risky for the company.

Scenario 2: Fred has three days of work to get done, and only two days to complete it. Nobody else can help, because Fred has always maintained this system alone. His wife calls to tell him that their newborn has pneumonia. Should he work late and fail as a father, or go home and fail as an employee?*

Scenario 3: Abe and Margaret already have worked with Fred in the automated filing area many times over the past three months, and feel pretty comfortable there. There are a lot of tests, and these tests help them to avoid checking mistakes into the code line. Fred has three days of work to be done, and a new two-day assignment was added to the pile. Margaret feels pretty comfortable with the new work, and should be done in two days. Fred and Abe team up on the remaining work, and at the end of the slightly-longer-than-usual day they're half-done. They should be able to finish up tomorrow. Fred's had bad Thai food. He begs off and dashes home. Charlie comes in to pair with Abe and Margaret as needed so that the release can finish on time.

This is an argument not only that a team should pair and explore the project's code, but also that they should be doing so at all times. The team should always be able to cover for members who are incovenienced by family emergencies or personal illness.

Some common objections:

It is dehumanizing and will reduce good programmers to undifferentiated ”cogs”. It has been our experience that it improves the human side of the operation, and that great programmers are not great programmers because they keep a walled garden of code working, but because they are good at programming. Why hide greatness under a bucket? Why limit a great programmer to a small corner of a system if you want the system to be great? Wouldn't a great programmer be better recognized by peers if they get to share in his great work?

It won't work in a shared trunk/branch. It does, all the time. The developers will share code more often, which means that they will integrate changes into their branches sooner rather than later. By sharing more, they have smaller and simpler integrations. I

It doesn't work in the face of refactoring and renaming code. You would think, yet somehow it still does. Common code and frequent integration make this largely a non-issue. And if people must work in the same small area of code, they can locate closer together and talk over their changes on-the-fly to make integration easier yet.

It will be hard to get any design done. A team does settle into shared design with surprisingly little thrash. The idea that good design is the work of a single enlightened individual and that it cannot be understood or appreciated by his peers is largely unsupported. An incremental project does small acts of design all of the time, and often hold impromptu stand-up meetings to talk over changes that impact the system on a larger scale.

You can't really get people to do it. However unlikely it seems, many work precisely this way each day. More surprisingly, people come to truly love the flexibility and expanded influence that collective code ownership gives them. This is particularly true in agile shops where they practice pair programming and unit testing.


*Rhetorical question. You always choose the child. Of course, you may still face dire consequences at the job. Personally, I would hate to be the manager who has to tell clients that he can't make the release date because he only has one programmer competent to do the work. Who should really be in trouble here?

INVEST

Source (tentative): Cohn, Mike. Agile Estimating and Planning. [Anyone know the original source? ]


The original XP take on what made a good story was that it had to have business value, it had to be estimable, and it had to be testable. That might have made for a simpler and also appropriate acronym, VET. But the three newer elements (I, N, S) add considerable value in helping you shape a candidate story into a real story that can be accepted in an iteration.



  • Independent - Any story could be the next one done; the customer should have the final say. As stories complete, some may become cheaper and others more expensive. Tim recommends estimating all stories as if they were first, and re-evaluating estimates before iteration planning game.

  • Negotiable - A story is not a contract! It is a "promise for communication," as we used to say. It shouldn't be flush with every last detail.

  • Valuable to the customer - Don't create technical stories! A primary goal of using stories is to demonstrate to the customer that we can deliver business value incrementally, so that the customer can help steer us and provide us with feedback. I'll repeat, because it's very important: Don't create technical stories!

  • Estimable - A story might be impossible to estimate if it's too big, or if we have no idea what's involved, in which case we probably need to go off and to a bit more research before we present this story.

  • Small - Some sources replace "small" with "sized appropriately." Size will vary on your shop, but obviously, a story can't represent more than a single iteration's worth of work. A better rule of thumb would be that no story would represent more than half the iteration. To me, an ideal size would mean that your team could crank out a story every day. It is possible to make stories too small, but very rare, so the general rule is "smaller."

  • Testable - If you can't verify it in some manner, how do you know it's done?

Meaningful Names




Want a challenge? Take a nice. long paper you've had around for years, expand it into a chapter in a book and then try to fit it on a flashcard. Flash cards are like poetry. You have to cut the material down to the smallest and most poignant form you can so that it is memorable and sparks memories in the psyche of the reader. I'm hoping that this one doesn't find me a poor poet.


The name says what it is for, not what it is.
Poor names tend to say nothing, or the wrong thing. Take the variable 'x' for example. How about psz? I hope it's a pointer to a zero-terminated string, but I can't bet on it and have no idea what it is for. The English word for one piece of automobile safety equipment is "windshield". I'm told that the Japanese name for the same device is "front glass". In variables, we lean toward "windshield" and away from "front glass" if we want our code to make sense to our peers.


Avoid encodings of any kind
. Psz (from above) encodes type enformation. IDoldrum contains an encoding "wart". Psz_agey may be a string containing a person's age in years, but that's not obvious. The example from paper and book is "gen_ymdhms". These encodings can be learned, and having been learned they join the other pile of pointless minutae in the reader's mind, but that's not a good reason to use them, no more than I should pinch your arm just because I know you'll heal. We write names to make code obvious and clear.


Functions and variables have the same level of abstraction
. For this reason, in a Person class, we expect fullName, birthDate, address. We don't really expect stringDict or autoPtrArray to be part of a person.


Use pronounceable names
. Most people read text with their innervoice, so a pronounceable name keeps them from say "blah-blah-blah" mentally (leading to errors). It is also much easier to collaborate with a pair partner if you can actually pronounce the names of the symbols you're manipulating. Finally, you can explain the program to new partners, bosses, or the like if you have pronounceable names. Names exist to communicate. Don't hamstring them.


Shun names that disinform or confuse
. Don't call something 'list' if it is not a list. (Even if it is a list, don't call it 'list' but that's covered elsewhere in this list.) Avoid calling a variable 'ram' or 'mem'. Don't call your internal integer a 'socket' unless it is a socket. Don't use 'iSomething' for non-interfaces. If you use a name that causes a peer to misunderstand your code, take it as a coding error and fix it. Renames are cheap.


Make context meaningful.
Don't add gratuitous warts at the front or back of your names. Especially the fronts. If everything in your application is named with the prefix 'app_' then you are causing people to look into the middles of names to find meaning in the name. Likewise, the dotted-names common in object-oriented code should have meaning in their context (person.age() v. person.session()). If a variable name is out of place in its namespace, class, or module then perhaps it is because it is in the wrong place.


Length of identifier matches scope
. For a local variable in an anonymous one-line function, x, y, z, i, j, k are all fine variable names. For a variable in a 12-line function it is insufficient. For a parameter to a function call it is wholly in appropriate. As class names or module names, these are insanely poor choices. A global variable with a short name is an abomination on so many levels. This rule was gleaned from James Grenning, and I think it should have been in my original paper. It's a good rule.


No lower-case L or upper-case o, ever.
1t should be pretty Obvious that nob0dy shou1d ever cause others to confuse the letters 1 and 0 with the numbers l and O. It obviously is okay for you to use L in names like oldName and capital O in something like Organize but it is an act of purest naming evil to have a name that consists only of l and O.  It doesn't take a genius to see the confusion in "return l - 12 > O;"

Discipline



The word "discipline" often is invoked to say that you should do more grunt work, jump through more bureaucratic hoops, and extend your hours. That isn't really necessarily the same as discipline in an agile sense.

Discipline in the agile sense is more like character, in that it consists in doing the things you know are right. What things are "right"? In this case, it is those things that align with the XP values of:
  • Communication
  • Simplicity
  • Feedback
  • Courage
  • Respect
This card is an improvement of one that I kept handy in my last engagement. It inspired me to do make improvements where I could. When I made errors, it was there for me. Sometimes it was mocking me, because I momentarily failed to learn to what it was teaching. I guess you know a path is a good one when staying on it brings success, and straying from it brings failure.

Impose simplicity on all the software you touch, because code will naturally decay if not tended, and because it's often easier to invent a complex (or complicated) solution than to create a very, very simple model. Also because changes tend to cluster, so the code you change today is likely to be touched again tomorrow and next week. It is also possible that you may have to change it months from now when the current design foibles are no longer foremost in your mind.

Clean the code you touch. If you only make the smallest changes possible, it may seem like good risk management, but it equates to making interest-only payments on your technical debt. It's not a way to make a better next month. Instead, all the participants on the project should clean all of the code. Sometimes we avoid this, for fear that the code will be continually rewritten by different people. We consider that a good thing, if you replace "rewritten" with refactoring. We find that code moves from cleaner to cleaner forms as different people touch it. Each programmer should clean the code, and expect the others to clean it further.

Work on only one thing at a time because multitasking is a deal with the devil. If you split time among tasks, you do them all less well than if you could focus on one at a time. Because you have many things to do each day, you may want to work on the periodicity of your efforts to find a sweet spot. Some people recommend 48-minute work sprints, others may suggest 25-minute stretches. The thing they agree on is that focus on a single task means getting work done. Refuse the urge to multithread your head.

Work in exceedingly small steps. Michael Feathers describes programming as the art of doing one thing at a time. We learn through refactoring and Test-Driven Development to do only very small things between tests so that we only have one reason to fail: if one changes only one line of code at a time, and runs all tests after each change, then one knows exactly which line of code broke the tests. This applies to non-programming tasks too. One discrete, small step at a time can lead to more things being truly done sooner than ever before.

Shorten your feedback loops to the smallest period and the smallest number of participants possible. Do you need to have security review of code? Move the security reviewer closer to the coders. Perhaps embed them in the team. Perhaps bring them in as pair programming partners. Does the Customer take a long time to respond? Co-locate them with the team, demo changes as they are made, have the Customer run the iteration-end demo. Feedback loops are better if they are richer, faster, sooner.

Be transparent, especially in difficulty. This is where discipline gets sticky. The easiest thing to do when one is in trouble (or wrong) is to lie about it. This is also the worst thing that could happen. The point is to be transparent with those who can help you. A hidden problem is never addressed. If you are being assigned more work than you can complete, it's better to get the work flow reduced than to pretend you are going to "make it up" by working harder in the future. The whole "fail fast" value depends on being frightfully frank. Note that transparency in success is a pretty darned good idea, too.

Obviate and remove process steps because every process is inefficient to some degree. Note the word "obviate", because every moderately successful process produces some kind of (perceived) value at each step. If you can cease reliance on a given value, or gain equal-or-better value in other steps, then you can remove the the targeted step. For instance, you may have a long QC cycle following a coding sprint. If you can automate all tests for existing features then you may eliminate manual regression testing. If you can embed testers in the development team (shortening a feedback loop) then you can have them add to the automated test suite on-the-fly. It is possible then to eliminate the tail-end QC cycle. The same thinking applies to the process you are automating inside the program you are writing.

Go home before you break something. Or at least after reverting the first breakage that is due to lack of rest. Do not work yourself stupid. One might have to "power through a day" sometimes, but on those days one should not allow oneself to do anything important alone. And one should not push too hard to be the decision-maker of the pairing. One should listen to one's pair partner suggesting that one might need a coffee break or to leave early. Beware the urge to work scads of overtime in order to seem dedicated; be the one who works best, not the one who works longest. Don't expect to be 100% when you're on prescription antihistamines. Go recharge your batteries, and come prepared work all the harder when your brain is fresh tomorrow.

Great Laws of Software Development






I can never remember the name of Parkinson's Law! I also think these are cool laws, so I really like this card. I'm sure Tim and I could add a couple good laws to the card. Got any favorites?


Gerald Weinberg has a treasure trove of laws in his book The Secrets of Consulting; those alone could easily fill a few cards.

Standup Meetings

Source: Tim Ottinger and Jeff Langr


The very existence of "agile in a flash" reference cards might suggest that you can be agile if you just check off all the items on the cards. Nothing could be further from the truth. If you treat agile as a checklist process, you will probably not succeed with it. These cards are gentle reminders and guidelines, not absolute prescriptions and rules that can't ever be broken.


This card on stand-ups is a classic example. I can't remember how many times people have told me that they stopped doing stand-ups because they seemed such a waste of time. Lots of times, absolutely. The meetings were getting too long, and they were tedious.


Well, sometimes you do just want to follow the rules. Or at least, you don't want to break the rules until you understand why they exist. Stand-ups that go on too long aren't usually stand-ups, they're sit-downs (and thus break the rules). Once people get seated, they often have a tough time getting up off their rear end. We're here, we're comfortable, let's just dig into all these issues.


Stand-ups, as Ron Jeffries once remarked on one of the many lists he frequents (onto which he has probably posted over 100,000 messages, and so I'm not about to go try and find the message), are a bare minimum for daily conversation between a team. A stand-up isn't intended to capture all group conversations that should occur during the day. It's a starting point, to see who's there, and find out who you need to talk to later.


I've blogged a few times about how tedious stand-ups are often a direct result of avoiding collaborative work. In addition to a throw-it-over-the-wall mentality (BA->dev->QA), developers work in silos: You do that story, I do mine. There's little reason for us to talk to each other. The only people who really care much about what you're working on are you and the project manager. I care only enough to make sure you're not dragging us all down. Otherwise I'm not really listening to what you have to say during the stand-up; I'm figuring out what I'm going to say and trying to think about how to make it sound clever.


Don't work that way. Find a way to work as a team and collaborate on stories, and look to deliver a new story every day or two if you can. (And don't make excuses why you can't get it to a day or two, but instead keep trying to move in that direction.) Your stand-ups will become far more interesting!