I have learned quite a lot from reading Lean Software Engineering, a website authored by Corey Ladas. Although I am not personally acquainted with him, I have gained a great deal of respect for his thinking and his experiences in applying lean principles to software development.
My growing respect for him may be the reason I was struck by a comment of his that I came across recently while browsing the site. In a response to a reader’s comment dated January 7, 2008, he wrote: “Pair programming is antithetical to Lean” It was just a flat assertion with no explanation.
I was puzzled. Lean software development doesn’t speak to particular development practices, as far as I know. What might cause a practice to be antithetical to lean or, for that matter, to support lean? The only basis I could think of on which to reach such a conclusion was that the practice either hinders or helps us in applying lean principles to software development. Corey did not provide the same level of careful analysis and explanation as he does with most of the useful material on his site, so I decided to reason through the question myself.
Lean priorities and the three Ms
To be sure I shifted my brain into the right gear before exploring the question, I began by reviewing some key principles of lean software development. There is a widely-quoted statement from David Anderson that goes something like this:
Value trumps flow
Flow trumps waste reduction
Eliminate waste to improve efficiency
Unlike a manufacturing process, any given step in a creative process will run at varying speeds. When we adapt concepts from lean manufacturing to creative endeavors such as software development, we must take this into account. In a software development process, the priorities David stated might be realized by maintaining buffers of incomplete work items between value-add activities, so that value-add work won’t come to a halt when an upstream step slows down. By the same token, WIP limits prevent overfilling a buffer when an upstream step speeds up.
By definition, the buffers contain inventory, and by definition, inventory is waste. We intentionally keep a certain amount of inventory in buffers in order to maintain continuous flow, which is of higher priority than waste elimination. People sometimes use the drum-buffer-rope metaphor to describe this.
Mura and induced work
Mura generally means unevenness, inconsistency, or irrregularity. In the context of manufacturing, it refers to variation in units produced on a production line. In the context of software development, it means uneven work flow. Uneven flow consists of stopping and starting, slowing down and speeding up, waiting, rework and defect correction.
In my own work, I’ve observed that mura leads to muri and muda. From conversations with others who practice lean software development, I gather that I’m not the only one to make this observation. I call mura a First Domino — a problem that tends to lead to further problems. Any time we can prevent a First Domino from falling, we prevent all the downstream problems it would have caused.
So, it suits my inherently lazy nature to identify First Dominoes and prevent them from falling, as it relieves me of the need to deal with a multitude of secondary issues later on. The notion of eliminating unnecessary activity is also consistent with lean thinking. Maybe that’s one of the reasons I appreciate lean thinking.
- Waiting for someone to become available to answer a question, to provide insight into a difficult problem, or to review a piece of work.
- Defect correction due to trivial oversights or mistakes in programming.
- Rework due to misunderstanding needed functionality, poorly factored code, or misalignment with architectural standards/guidelines.
- Stopping work on a value-add activity in order to address rework or defect correction (opportunity cost).
Alan Shalloway coined the term induced work to describe unnecessary work that we create for ourselves. We cause mura when we organize our work in certain ways or when we choose certain methods or techniques to carry out specific tasks. We cause stopping and starting when we choose to have too much work in process and we ask individuals to multitask across many concurrent initiatives. Our work slows down and speeds up when there is significant inherent variance in the level of difficulty of different tasks. We can cause this (or exacerbate it) based on the way we decompose work into discrete tasks. When a task is carried out by an individual or work group lacking the necessary expertise to complete it independently, we cause waiting, as the person with the answer is often in high demand and is not available immediately. We cause rework when we believe we understand what is to be done and we complete a task accordingly, but our understanding was incorrect. We cause defect correction when we overlook something in our work; usually, something minor that simply fails to catch our attention in the moment.
I observe that mura induces work by creating muda and muri:
- Induced muda: Rework, waiting, defect correction.
- Induced muri: Overtime or work speed-up to try and meet original commitments on top of induced muda. Induced muri can lead to still more induced muda, as people tend to make more mistakes when they rush or when they are tired.
What effects (if any) does pair programming have on induced work?
Let’s see what the general effects of pair programming are, and then we can consider how those effects relate to the idea of mura and induced work.
Dr Laurie Williams and Dr Alistair Cockburn investigated the effects of pair programming, and reported their findings in The costs and benefits of pair programming (PDF), presented at the XP 2000 conference in Sardinia and published in the 2001 book, Extreme Programming Examined, edited by Giancarlo Succi. In exploring “eight paths of software engineering and organizational effectiveness,” the authors found that “all paths point to pair programming.” The eight paths were economic, satisfaction, design quality, continuous reviews, problem solving, learning, team building and communication, and staff and project management.
The investigators found this result “surprising.” That comment may seem insignificant, but I think it contains a clue about the reasons why many people assume pair programming is just some sort of game (and I use the word assume quite intentionally). The idea is counterintuitive. Even to people whose professional focus has been to study the effectiveness of software development processes, methods, and practices, the result was surprising.
Among other results, the authors summarize the outcome of a controlled experiment in pair programming Dr Williams conducted at the University of Utah in 1999. The experiment found that pair programming increased total development time by about 15% (not double, as one would intuitively expect), and reduced defects by about 15%. According to the paper, using conventional methods “programmers inject 100 defects per thousand lines of code. A thorough development process removes approximately 70% of these defects. Therefore, the individuals would be expected to have 1,500 defects remaining in their program; collaborators would have 15% less or 1,275 – 225 less defects. […] Using a fairly conservative factor of 10 hours/defect, if testing finds these ‘extra’ 225 defects they will spend 2,250 hours – fifteen times more than the collaborators ‘extra’ 150 hours.”
To put the result in the context of lean thinking, one could conclude pair programming can reduce lead times by 2,100 hours, given the same general conditions as the experiment. Whenever we are considering the cost of rework or defect correction, we want to consider opportunity cost, as well. Each hour spent in rework or defect correction is an hour forever lost to value-add work. In that sense, the cost of the 2,100 hours of defect correction is really equivalent to 4,200 hours.
An earlier study of pair programming was conducted by Dr Randall Jensen in 1975. Writing in Crosstalk: The Journal of Defense Software Engineering for March, 2003, he explained, “I was introduced to teamwork and pair programming indirectly as an undergraduate electrical engineering student in the 1950s. Later in 1975, I was asked to improve programmer productivity in a large software organization. The undergraduate experience led me to an experiment in pair programming.”
The term “pair programming” had not yet been coined at that time, and the investigators used the term, “two-person team” to describe the technique. The way they arranged the work environment for the two-person teams would be recognized immediately by today’s software developers as a pair programming set-up. The 1975 study found that a two-person team generated 300% fewer defects than individual programmers developing the same solutions. Dr Jensen commented that “a three order-of-magnitude improvement in error rate is hard to ignore.” It seems he underestimated people’s capacity to ignore things that don’t correspond with their preconceptions.
To date, there have been very few original and well-crafted studies of the effectiveness of pair programming. One can find many published papers, but most of them are not reports of original research, but surveys of existing literature. The two studies cited here are the only ones I know of, personally, that provide credible experimental evidence about pair programming. Therefore, I wanted to look for some other form of evidence that close collaboration may be a useful technique. The anecdotal evidence from programmers is mixed; some report positive experiences and others report negative experiences.
It turns out that close collaboration — what we might call “pairing” in a general sense — works well when certain conditions are met. When those conditions are not met, then there is no benefit in pairing, and it may even yield poorer outcomes than solo work. For example, in a paper entitled “Optimally Interacting Minds,” published in Science for August 27, 2010, and summarized in a piece on Discovery magazine’s website, investigators reported the results of a controlled experiment that explored the question of whether two people working in collaboration completed various tasks more effectively than people working alone. They found:
For two observers of nearly equal visual sensitivity, two heads were definitely better than one, provided they were given the opportunity to communicate freely, even in the absence of any feedback about decision outcomes. But for observers with very different visual sensitivities, two heads were actually worse than the better one. These seemingly discrepant patterns of group behavior can be explained by a model in which two heads are Bayes optimal under the assumption that individuals accurately communicate their level of confidence on every trial.
In the context of pair programming, this result suggests that the effectiveness of pairing depends strongly on the way in which the two programmers interact with one another, and on their self-awareness regarding their own skills. I see a connection here with observations I have made in the past regarding our lack of a professional corpus of knowledge that we transmit from generation to generation of software developers. Each generation tends to re-invent every wheel for themselves. (See “Can a new dog learn old tricks?” on this blog.) Software development seems to take place in Lake Wobegon, where “all the women are strong, all the men are good-looking, and all the children are above average.” Most of the above average children stand on the bottom rung of the Conscious Competence Ladder. If they try pair programming and they have difficulty with it, they may not be equipped to understand why, and simply conclude that pair programming “doesn’t work.”
If the value of close collaboration in technical work has been recognized at least since the 1950s, and if pair programming specifically was vetted experimentally as long ago as 1975, then why do people in the software development field still argue about it today? Looking back over the history of software development methods and techniques, I am struck by a recurring pattern: Generation after generation, various development techniques are discovered and forgotten, discovered and forgotten, discovered and forgotten over and over again. Each re-discovery re-brands the technique, and when a competing methodology dares to suggest the same technique with a different brand, the two camps attack one another with the same intensity as two closely-related sects of the same religion.
The 2007 post in which Corey made the comment about pair programming was, in fact, a rant favoring one sect over a competing, nearly identical sect within the religion of effective software development practices. It was not representative of his usual objective, analytical approach. The assertion that pair programming is antithetical to lean does not withstand an objective assessment of the effect of the practice on lean-defined forms of waste. It is an emotional statement. And, in fairness, the comment was published several years ago. The author’s understanding may have changed since that time.
Why is it that smart people can resort to statements of this kind? It may just be a feature of human nature. We all have blind spots. I appreciate Corey’s work and that of other people from whom I can learn. I hope I can also help people learn. But we would be wise to avoid just accepting any statement a person makes, even if we respect that person highly. It’s better to take what they say and run it through our own filters; reason through it independently, or try what they suggest and see whether it works for us. Then, whether the outcome is good or bad, we should try and understand why and how the outcome came about, rather than just accepting or rejecting a practice universally.
Conclusion: Pair programming is a natural fit for lean development
The effects of pair programming include reducing wait time and reducing errors. Cross-disciplinary pairing also enhances clear communication and common understanding of what needs to be done, reducing rework. Pair programming, and pairing in general, are therefore perfectly aligned with lean thinking and lean software development. They directly reduce mura, which is a First Domino, and by extension a good deal of muda and muri as well. They also promote double-loop learning by cross-pollinating the knowledge and experiences of team members, helping to support the lean concept of continuous improvement (pursuit of perfection, the fifth lean value).