Does pair programming work?

The fact this question continues to come up time and again after all these years prompted me to wonder why the matter hasn’t been settled by now. Thousands of people have tried their hand at pairing in a wide range of circumstances. Some swear by the practice and feel as if something is missing when they must work solo. Others are convinced pairing is pure waste and cannot possibly yield good results. Both opinions are informed by real-world experience. What specific differences in these situations resulted in such radically different outcomes?

Binary thinking in the land of applied logic

My first stop on a mental journey to find an answer was the observation that our line of work suffers from a strange and ironic trait: On the one hand, software development can be described as applied logic; on the other hand, many of us tend to comprehend any method, practice, framework, or approach in a completely binary way…either the thing "works" or it "doesn’t work," universally and forever, regardless of the conditions under which the thing was attempted or the level of skill with which practitioners applied it.

While software development practitioners will disagree about many things, I suspect most of us can agree that binary thinking is fundamentally illogical. Of all the occupations in the world, ours (along with at least mathematics, philosophy, and engineering) depends on logical thinking. With that in mind, why do we grace a question of the form, "Does X work?" with even a moment’s consideration? Why do we not recognize that the question cannot be sensible, from a structural standpoint?

Why do people who earn their living by the practical application of logic assume (apparently) that by labeling whatever they happen to be doing with some buzzword they can reach a judgment about the practice that the buzzword represents? Where does their logic go to hide while they debate "Does X work?" questions? I’m afraid this might be a psychological question and, therefore, outside my area of expertise. I am left with the suspicion that those who take a purely binary stance on such questions must have something to "sell." Whether they are selling pair programming or the status quo ante, it puts me on guard.

What does "it works" mean?

The second stop on my journey was to try and come up with a practical definition of "it works" or "it doesn’t work" that might be useful for the purpose of understanding whether, how, when, and why pair programming wor— er, that is, might help us deliver good results. Quasi-religious arguments about pairing haven’t settled the issue, despite a massive, dedicated, years-long effort on the part of red-faced pundits screeching with the volume turned up to eleven.

Nor have academic studies offered much help in settling the matter. It’s easy to find papers online and in the proceedings of various software conferences that address the topic, but most of them refer back (sometimes through several layers of references) to a single study carried out in 1999 at the University of Utah. The results were reported at the XP 2000 conference by Alistair Cockburn and Laurie Williams, in a paper entitled "The Costs and Benefits of Pair Programming."

That wasn’t the first study of the practice, however. In 1975, the US Army carried out a study of pair programming, which they referred to as "two-person teams," in a significant software development project. A paper in Crosstalk for July 1996 summarized the study. Unfortunately, the article is no longer online, as Crosstalk reorganized their website and removed back issues prior to 1998, presumably on the assumption that the older material would not be of much interest.

I’ll mention some of the results of those two studies a bit later. For the moment, my point is simply that there is a dearth of credible, well-structured studies to investigate the effects of pair programming. This suggests that we cannot rely solely on direct studies of pair programming to get a sense of how the technique works or its effects on quality and cost.

"It works" strikes me as contextual, and yet the majority of people whom I hear or read pontificating about pair programming ignore context and rely on broad generalizations. They are quick to say "it works" or "it doesn’t work," but slow to explain the conditions under which pairing may add value and those under which it can only add overhead.

I will assume, then, that "it works" means pair programming helps us achieve the benefits its supporters cite most frequently: Improved code quality, higher team bus number, reduced cost of ownership of the resulting solution, and mentoring junior team members. In situations where pair programming helps achieve those benefits, we can say "it works."

On further consideration, it may be more accurate to say it might work. It turns out that there are several additional factors to consider.

Physical constraints

The third stop on my journey to an answer was to recognize that physical constraints can prevent pairing. It’s my observation that people tend to follow the course of least resistance. When pairing is the easiest thing to do, they do it. When working solo is the easiest thing to do, they do it. There doesn’t seem to be a connection with whether the individuals on the team feel pair programming is or isn’t effective.

One team that I coached expressed the desire to "pair more." They had defined it as a team goal. Yet, they were not pairing at all. In every retrospective, they raised the issue and agreed, again and again, that they wanted to pair more. The team had eight programmers who worked in a collaborative space with eight identically-configured workstations. One day, their manager and I stayed after hours and removed four of the workstations. The next day, the programmers found the easiest thing to do was to sit at a pairing station with a partner. Guess what? They paired more.

At another company, people thought pair programming was a good idea, but they didn’t want to give up their individual cubicles. Management set up pairing stations at the end of each row of cubicles. No one ever used them. The simplest thing to do was to sit in one’s own cubicle. Pairing would have required extra effort, first to find a partner to work with and then to go sit somewhere other than one’s usual place. It wasn’t the course of least resistance.

For me, the lesson in this is that if the team or their management or both want programmers to pair up, they need to make that style of work the course of least resistance. Making it "easy" or "not too hard" isn’t enough. It has to be the easiest thing possible.

Management issues

The fourth stop on my journey was to recognize that management style can have a dampening effect on all sorts of good practices, including pair programming. When a team or an organization desires to take advantage of the lessons learned in recent decades around lightweight methods and collaborative work, management must understand the implications of metrics and employee assessment practices.

If management tracks delivery performance at the individual rather than at the team level, people will be penalized for collaboration of any kind, including pair programming. If management assesses employee performance at the individual level, then programmers will be incented to show individual results at the expense of their team mates, and will be disincented to collaborate freely. As Eliyahu Goldratt famously said, "Tell me how you will measure me, and I will tell you how I will behave."

When a team attempts to employ a collaborative working style, their management has certain responsibilities to ensure the team has a fighting chance to get their work done. One of the key responsibilities is to ensure the team is left to its work for significant chunks of time during the work day. When individuals are interrupted, they lose the sense of "flow" and forget what they were doing before the interruption. When pairs are interrupted, the negative impact is doubled.

The nature of the task

The fifth stop on my journey was to recognize that some tasks lend themselves to collaborative work while others lend themselves to single-minded focus; and I mean, literally, single-minded.

Practitioners of pair programming who have reached a high level of proficiency with the technique will tell you they don’t insist on pairing for every task. They point out that some tasks call for quiet, steady focus rather than continuous collaboration. Coaches experienced in Extreme Programming and Software Craftsmanship often advise novice teams to pair by default and work solo by exception. This is a mechanism to inculcate an unfamiliar skill, and is not meant as universal advice with no context. In contrast, people who are skeptical about the value of pair programming but who are not dead-set against it often advise teams to resort to pairing only under special conditions, such as when helping a new team member come up to speed on the codebase or when tackling an especially challenging task.

The range of different advice about when to pair suggests that practitioners are aware that some tasks benefit more than others from the practice. Not everyone has the same opinion about exactly which tasks those are. Proponents see value in pairing for a wider range of tasks than skeptics. Nevertheless, practitioners do recognize that every task won’t benefit from pairing, and that in itself suggests a simple binary answer will not accurately describe the usefulness of the technique.

Learning curve

The sixth stop on my journey to an answer was to recognize that pair programming is a learned skill, and to master any learned skill requires time and effort. The well-known Satir change model, developed by family therapist Virginia Satir, has been applied to many domains. The same idea is expressed as an S-curve or the Change Curve, depending on whom you ask. The basic idea is that whenever you introduce a change, things will get worse before they get better.

Detractors of pair programming are fond of leaping to the conclusion that "it doesn’t work" based on the fact that novices don’t immediately improve their performance within an hour of trying pair programming for the first time in their lives. The truth is you won’t be able to tell whether pair programming "works" in your situation until you’ve given it a fair trial.

How you do it matters. In an interactive session I facilitated with Lasse Koskela, Brett Schuchert, Ryan Hoegg, and George Dinwiddie at Agile 2008 and Agile 2009 and with audience volunteers at Devoxx 2008, we explored several ways in which the two partners can completely spoil pair programming. A large number of ineffective collaboration patterns exists. Hammers are not categorically useless just because careless people sometimes hit their thumbs. It’s up to you to hit the nail squarely, and there are many more ways to hit your thumb than there are to hit the nail squarely. As pair programming is not a biologically hard-wired instinctive behavior, you need to allow for learning curve time…unless you’ve already made up your mind against it, of course. That’s always a real time-saver.

Personality types

Another dimension of complexity in answering this question is the effect of different personality types on the outcome of a pair programming session. The seventh stop on my journey to an answer was to consider what might happen when the two partners have particular personality types, and they haven’t learned how their own type affects collaboration.

Many teams take personality assessments to help them understand one another so that they can work more effectively as a team. Two popular assessments are the Myers-Briggs Type Indicator (MBTI) and the DISC Profile. Both models reduce the rich tapestry of human existence to a few general archetypes that predictably exhibit a handful of simple behaviors. The DISC Profile reduces us further than the MBTI, using just four categories: Dominance (D), Influence (I), Steadiness (S), and Conscientiousness (C). Let’s use that one here, just because it’s simpler.

What might happen if two D types sat down to pair program together? Unless they were aware they were D types and understood how to mitigate their own behavior to achieve effective collaboration, they might get into circular debates about technical implementation, and never finish their task. If two I types paired together, they might try to teach or influence each other in a circular fashion, also never finishing their work. If two S types paired up, they might be reluctant to refactor code without asking the original author if it was okay with him or her. If two C types paired, they might slip into "analysis paralysis," never finishing their task.

If a strong D type paired up with a partner having any of the other types, he/she might just drive ahead with his/her own ideas. This could result in a worse outcome than the team would have achieved if the two had worked separately and alone. You can probably imagine negative outcomes resulting from any other combination of types, too.

Of course, models like DISC and MBTI are only valid in the aggregate. They are statistical models. Real individuals don’t usually behave exactly as the models predict they will. Even so, the combination of personality types in a pair can have an effect on the outcome of the pair programming session, unless the individuals are aware of it and consciously deal with any collaboration issues.

When two heads are better than one

The eighth stop on my journey to an answer was to consider how two-person collaboration actually works.

In "Optimally Interacting Minds", in Science Vol. 329 No. 5995, August 2010, Bahador Bahrami, Karsetn Olsen, Peter E. Latham, Andreas Roepstorff, Geraint Rees, and Chris D. Frith reported the results of a study they conducted to explore the idea that two heads are better than one. They wondered for what sorts of activities two heads might actually be better than one. Here’s an excerpt from the abstract of their paper:

For two observers of nearly equal visual sensitivity, two heads were definitely better than one, provided they were given the opportunity to communicate freely, even in the absence of any feedback about decision outcomes. But for observers with very different visual sensitivities, two heads were actually worse than the better one. These seemingly discrepant patterns of group behavior can be explained by a model in which two heads are Bayes optimal under the assumption that individuals accurately communicate their level of confidence on every trial.

For a slightly less dry description of the experiment, we can turn to Scientific American for August 31, 2010, in which Ryota Kanai and Michael Banissy conclude, "Are Two Heads Better Than One? It Depends." What does it depend on? As they explained it:

The key to the success was communication — about how they felt about their answer and how confident they were in their decision. When they were not allowed to communicate with each other about their confidence, they couldn’t do any better than the best solo player. […] It was also critical that both players reported their confidence reliably. If one of them was poor at the task but didn’t know it, the team performance only got worse. […] This implies that two heads may be better than one, but only when we can competently discuss our different perspectives. If one person in the team has flawed information — or is less competent — then the outcome can be negative and perhaps you should completely ignore them.

None of this concerned pair programming specifically or software development in general. How does it apply to the subject? I think the results suggest that when the partners have similar levels of skill and they are able to communicate freely and accurately, they can achieve better results than they could have done by working separately and alone. If the two do not communicate almost constantly throughout the pairing session, then the outcome is likely to be poor. If their skills levels are different and the less-skilled partner perceives him/herself as highly skilled, then their results are likely to be worse than the more-skilled partner could have achieved alone.

I emphasize the word and in the previous sentence because we often use pair programming explicitly as a mechanism to ramp up junior team members. The key point is that when we are doing this on purpose, it isn’t a problem. The problem occurs when we think the two partners are technical peers, and they are not.

By the way, communication doesn’t necessarily mean continuous talking. In a style of pair programming known as "silent running," the partners communicate only by writing code. The code each one writes communicates something to his/her partner about what code to write next. If you aren’t a programmer yourself, this might not make intuitive sense; all I can say is "trust me."

The Lake Wobegon Effect

The ninth stop on my journey to an answer was to recognize that most programmers have an unrealistic notion of how skilled they are. The US radio program, A Prairie Home Companion, takes place in the fictional town of Lake Wobegon, where "all the women are strong, all the men are good-looking, and all the children are above average." Programmers, apparently, hail from this very town. We are subject to the Lake Wobegon Effect, whereby we overestimate our own skills.

A less colloquial take on the phenomenon is known as the Dunning-Kruger Effect, after a paper published in 1999 by those authors. The phrase "Dunning-Kruger Effect" has entered into popular mythology to suggest that incompetent people believe themselves to be excellent performers while competent people know they are excellent performers, so they have no problem. Since, in the popular imagination, both sets of people consider themselves to be excellent performers, I have to wonder how any of us can tell the difference. As Dr. Dunning clarified in a comment to the cited blog post, their study actually found that "poor performers are overly confident relative to their actual performance. They are not more confident than high performers."

In view of the results of the Bahrami study of "two heads," the Dunning-Kruger Effect holds when the two members of a pair are of unequal technical skill. This is fine when the purpose of pairing is to enable the senior member to mentor the junior member, but it has a detrimental effect when the pair is assumed to consist of peers. In that case, the outcome is likely to be worse than the senior member could have achieved by working solo. It negates the positive effect of "two heads."


The tenth stop on my journey to an answer was to consider the importance of self-discipline in applying good practices when we are under pressure to deliver. My observation is that most of us tend to set aside good practices when we feel as if we have to rush to complete a task. Two experiences in particular come to mind.

The first was a workshop conducted by Alistair Cockburn at the Agile 2010 conference entitled, Elephant Carpaccio. Teams of two or three attacked a simple programming problem in nine-minute iterations, slicing the requirements as thinly as possible and using whatever approach and techniques they pleased. At the end of each iteration, the incremental results had to be demonstrated to a "customer."

Afterwards, Alistair explained that he had run this workshop with a total of 400 people (as of that time), and that only three had applied good practices and completed the code. The rest had chosen either to complete the code by "hacking" rapidly, or to follow good practices and run out of time before they had finished everything.

The successful team showed that it is possible to achieve good results through the disciplined application of good practices. Meanwhile, the relative number of participants who abandoned good practices in order to "finish everything" and those who applied good practices sluggishly showed that we have a tendency to lose self-discipline when we are under delivery pressure. If the numbers from the exercise are representative of behavior in the field, then under one percent of professional software developers apply self-discipline when under pressure. My guess is the numbers from the exercise are better than industry norms, as all the participants were self-selected software craftsmen. The situation in the "real world" is probably worse.

The second experience occurred at the XP Days Benelux conference in 2010. Marc Evers and I co-facilitated a workshop on developing causal loop diagrams, or diagrams of effects, as a way to understand root causes in complex environments. We called it Things never change. The participants comprised 35 programmers who were advanced practitioners and strong proponents of Extreme Programming.

We did not have a mixed group that might have brought more perspectives to the table. For that reason, the analysis tended to come from the point of view of programmers on a development project. It was interesting to see what was missing from their analysis. In particular, they could think of no potential causes for technical debt other than "management pressure to deliver."

Marc and I stopped the exercise briefly to discuss this. We had been thinking that technical debt arises when programmers fail to apply well-known good practices. The group was thinking that technical debt arises when management shouts about the delivery schedule. They seemed to make the assumption that the only way to "go fast" was to abandon good practices. If true, that would mean good practices are only "good" when things are easy. When things become hard, we must resort to "hacking." The idea did not appeal to me.

I asked the group how many of them believed the XP practices were the best way to deliver software. Every hand in the room went up. I suggested to the group that when we are under pressure, we demonstrate what we truly believe through our actions. When we are under pressure, we do what we think will get the job done, and nothing else. Therefore, if they abandon good practices when under pressure, it means they do not actually believe the XP practices are useful. Their managers are not "causing" technical debt; they are creating it themselves. They looked crestfallen. I was afraid I had lost a few friends, but ultimately I learned they had carried the message forward in their own work. Maybe there is hope, after all.

The rest of the puzzle

The eleventh stop on my journey to an answer was to recall that pair programming is usually a single piece of a larger puzzle; that is, we usually apply pair programming as just one practice in a set of related, mutually-reinforcing practices. These include such things as continuous integration, sitting together, writing unit tests before writing code, frequent refactoring, short feedback loops, and others. The more pieces of the puzzle we can put into place, the prettier the resulting picture.

When I’ve listened to stories of pair programming "failures," I’ve often noticed that people seemed to have attempted pairing outside the context of related good practices. I’m reminded of comedy fights in which the combatants slap wildly at one another while angling their heads backward, eyes tightly shut. They want to fight but they don’t know how. I think pair programming has the best chance to yield good outcomes when it is applied mindfully as part of a coherent set of development practices. When people just sit side by side and change no other variables in their environment, how good can the outcome be? <Slap, slap, slap.>

Where did pair programming come from?

It’s always dangerous to speak for others, and it isn’t my intent to do that now. I do want to take a quick look at a certain software development project because it gave rise to the popularity of a development method now known as Extreme Programming (XP). One of the practices central to XP is pair programming. What happened on that project that led the team members to conclude pair programming was a Good Thing?

Depending on whom you ask (and what they’re selling, I suspect), the Chrysler Comprehensive Compensation (C3) project in the late 1990s was either a roaring success or a dismal failure. The truth, I think, is that it was like any real-world application software development project in a corporate IT department. The team delivered a working system more-or-less on schedule and on budget. Enhancements continued for a couple of years after delivery, and at some point there was management turnover and the new management crew decided to move on to some other solution for their payroll processing needs. I had an opportunity to ask one of the team members, Chet Hendrickson, the reason the solution was abandoned. He explained that management felt it would be difficult to find Smalltalk programmers to maintain the code in the long term.

In the course of all that, the team worked out the details of XP. They came up with a robust and practical way to deliver application software products. XP has since been used in thousands of companies around the world with the usual sort of mixed results that any methodology achieves, mostly for the same reasons as every other software development methodology in history.

Pair programming was central to the work of the C3 development team, and it "worked" according to the definition I suggested above. It worked very well indeed. Why did it work?

Based on things I learned in my journey so far, I would say there were several reasons why pair programming was effective on the C3 project. First, it wasn’t their first pass at the problem; C3 was the third attempt by Chrysler to implement a payroll system.. The people involved already had a pretty good idea of what a payroll system had to look like. They didn’t really need traditional, "heavyweight" requirements to tell them what to build. What they did need was frequent feedback from customers to make sure they built what was wanted, so they could avoid the mistakes of the first two attempts to create a payroll system. Second, the User Stories were not all radically different from one another; most were variations on a few themes, as they represented thin slices of application functionality for a problem that was well defined and, one might say, "solved." It wasn’t as if one User Story called for an inertial guidance system for a spacecraft, the next for a tax rate calculator, and the next for an embedded system to control a toll plaza for a highway. The team could drive through the User Stories without encountering too many speed bumps. The lightweight approach to requirements was perfectly appropriate, and the programmers could let the solution emerge bit by bit with very little risk. Finally, the individuals on the team were extremly high-level expert programmers who were experienced in collaborative work. They also enjoyed strong management support for the way they wanted to work.

The physical conditions were right, the management style was right, the nature of the work was right, the learning curve for pairing was short (for the particuar individuals involved), most of the team members already knew each other well, and any two of their heads were better than one for the reasons discovered in the Bahrami study. Pair programming could hardly fail under those conditions. Add a dash of retrospective coherence and you’re on your way to a published methodology. All good.

To this day, XP remains the best way anyone can think of to approach certain types of application development. It has worked especially well for delivering web applications, and has been very useful in a number of other categories, too. Pair programming is among its most fundamental and most powerful practices.

What has pair programming led to?

Contemporary lightweight methods for software development and delivery emphasize the idea of the cross-functional team. The basic idea is that a team includes members who, among them, have all the necessary skills to achieve the goals of the project.

When cross-functional teams employ XP or another method based on the values and principles of the agile movement, they very often adopt pair programming as one of their core practices. Initially, this directly involved team members who specialized in programming, while those who specialized in other disciplines pertinent to software delivery worked solo.

A logical step forward from having individual team members with different skills is to encourage team members to learn one another’s skills. A concept that emerged during the time frame of the agile movement is the generalizing specialist, conceived by Scott Ambler. It describes a professional who has one or more areas of specialization, and who has learned and continues to learn at least basic competency in other related areas. For example, a programmer might learn a bit about software testing, or a database specialist might learn a bit about application programming. This enables teams to smooth out their work flow as the demand for different types of work shifts during project execution.

Cross-disciplinary pairing has become a key mechanism for developing generalizing specialists in an organization. What better way for a tester to learn, say, analysis skills than to collaborate directly with an analyst on real work?

A different thread of development in software development practice has benefited from cross-disciplinary pairing, as well. Test-Driven Development (TDD) gave rise to Behavior-Driven Development (BDD), which evolved into Specification by Example. A natural consequence of this is to merge the disciplines of analysis and testing. Cross-disciplinary pairing comes into the picture as the most direct way for analysts and testers to collaborate.

Examples as specifications can be automated, using tools like Cucumber and FitNesse, among others. How can analysts and testers learn the programming skills necessary to use these tools effectively? You guessed it: They pair up with programmers, and in the process the programmers learn a fair amount about analysis and testing, as well.

In a nutshell, cross-disciplinary pairing has had the general effect of improving everyone’s appreciation for the work their team mates perform, and has improved the quality of communication across disciplines. For those with an interest in the generalizing specialist idea, it has also provided a practical mechanism to expand our skill sets while working on real tasks, rather than taking time out for training classes.

What did the studies find?

In case you’re inclined to care about studies, here’s a synopsis of key findings of the two studies I mentioned earlier. First, here is an excerpt from the 1996 article in Crosstalk by Dr. Randall Jensen, entitled "Management Impact on Software Cost and Schedule." Here, Dr. Jensen recalls the two-person team experiment he conducted in 1975:

The two-person (2P) team implements the adage "Two heads are better than one." When this concept was first implemented in 1975, there was great concern the productivity gain could never offset the additional resource expense. The two-person approach places two engineers or two programmers in the same location (office, cubicle, etc.) with one workstation and one problem to solve. The team is not allowed to divide the task but produces the design, code, and documentation as if the team was a single individual. The 1975 team’s project was a real-time, multitasking system executive of approximately 30,000 FORTRAN source lines. The development team had five 2P teams and a progressive Theory Y type project leader. The traditional development environment, outside the 2P team organization, was typical of most environments. The architecture design was completed in a war room environment. The project architecture divided the development into six independent tasks, with the two smallest tasks assigned to one team. The 2P teams returned to the war room environment during system integration. The team concept appears to have been violated when the project was divided among five independent teams working in their own small war rooms (two-person offices). There were two organizational issues working here:

  • The facilities people (known as the furniture police) were not convinced this idea would work.
  • The tasks were truly independent. Thus, the minimum team size of two programmers was adequate, and the project proved the significant benefits of teams as small as two people.

Final project results were astounding. Total productivity was 175 lines per person-month (lppm) compared to a documented average individual productivity of only 77 lppm prior to the project. This result is especially striking when we consider two persons produced each line of source code. The error rate through software-system integration was three orders of magnitude lower than the organization’s norm. Was the project a fluke? No. Why were the results so impressive? A brief list of observed phenomena includes focused energy, brainstorming, problem solving, continuous design and code walk-throughs, mentoring, and motivation.

Dr. Jensen focuses on key metrics of lines of code per person-month and productivity, which many people today consider anachronistic, but this is merely a reflection of the times. Notice his assessment of the reasons for success, and how similar the list is to the strengths of contemporary lightweight development methods. Notice, too, that every one of them is directly supported by pair programming.

In “The Costs and Benefits of Pair Programming”, Alistair Cockburn and Laurie Williams describe the results of a pair programming study carried out at the University of Utah in 1999. This one is online and you can read it yourself, so I won’t quote it as extensively as the Jensen piece. Among other things, the results of the study indicate that the cost of pair programming is not nearly as high as detractors claim. A pair could produce roughly the same result in about 15% more time than two people working separately. Furthermore, defects were reduced significantly, which ultimately amounts to savings in both cost and time. The paper mentions certain factors that help pair programming yield good outcomes, including the "expert within earshot" pattern and "reflective articluation," Ward Cunningham’s Sunday phrase for thinking out loud.

Do I have an opinion?

I’ve been exploring the question, "Does pair programming work?" mainly by looking at things other people have said and done. Does that mean I have no opinion of my own? As luck would have it, I do have an opinion.

One of the benefits I’ve experienced is that I can get a much clearer understanding of requirements, and my understanding tends to be more consistent with that of stakeholders and team mates than when I work solo. This has the downstream effect of reducing defects and rework. The net impact on delivery effectiveness, when this effect is extrapolated to the team level over the course of a development project, is very significant.

By collaborating with testers in short feedback loops, one User Story at a time if not one task at a time, I have learned more about what testers expect to see, and what they expect to be able to do when code is delivered to them. Better still, by pairing directly with testers I have been able to pick up a few testing skills for myself. Applying those skills during programming helps to reduce defects and rework.

In my experience, pairing helps us maintain self-discipline. The evidence suggests self-discipline doesn’t maintain itself automatically (to say the least). Working with a partner, would you ignore coding standards, forget to refactor incrementally, gloss over unit test cases, or keep code checked out all day long? Probably not. When we pair, it’s easy for us to keep each other honest. The effect? Reduced defects and rework.

So, my opinion is yes, it works, provided the preconditions for it are met and we use it with self-discipline.


Pair programming works when it works. Just beware of binary thinking, and watch where you swing that hammer.

6 thoughts on “Does pair programming work?

  1. Hi,

    Very interesting. I looked at psychology of pair programming for my PhD. In addition to the studies you mentioned I found Nosek’s studies of large, experienced projects pretty compelling.

  2. Subjective opinions are unconvincing. I think it would be beneficial to expand the studies section. For this reason, “Making Software: What Really Works and Why We Believe It” ( remains an important, underrated book. Chapter 17 on Pair Programming provides a good overview on the practice and the mostly positive impact on software development. As for TDD, BDD and specification by example, I am doubtful of these techniques. The important thing here is the specification and examples are a mostly necessary, but insufficient approach. As an industry we need more rigorous and cost-effective techniques (e.g. Quickcheck, lightweight formal specs, design by contract, etc.).

  3. Thanks for putting this together – I’m glad to have read it. The part about personality types reflected something I think I understand superficially, but need to take a deeper look at. The benefits of pairing on discipline really can’t be overstated – living up to high standards is hard and any extra support makes a big difference.

  4. Thanks to tweeps for all the retweets and kind comments:

    @amckinnell – Wow. Nicely said. Thank you for writing Does Pair Programming Work?

    @fredverheul – Interesting analysis. Not common in the SAP ecosystem yet.

    @lisihocke – Pairing helps us maintain self-discipline.

    @msnyder – Excellent post on developer context and pairing.

    @gil_zilberfeld – Excellent analysis that can be applied to other practices.

    @gasproni – Interesting post

    @pawelbrodzinski – This is profound knowledge on pair programming. A really good piece.

    @annwitbrock – It’s long but it’s a thoughtful analysis of preconditions for effective pair programming.

    @PeterKretzman – “Those taking a purely bindary stance on such Qs must have something to sell.”

    @mcornell – There’s a lot in there about reqs too.

    @unacoderX – tl;dr warning

    @geertbollen – Nice. +1

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s