Monday, November 15, 2010

What Continuous Integration Is Really About

Recently, I have been encountering a number of environments where developers work in multiple branches and do not integrate their code till the end of the iteration. They end up often spending hours fighting to merge the code in correctly, sometimes resulting in bugs or missed features.

When I see that, I cannot help but remember the pains of integrations in 6-month-long Waterfall projects. I was a junior developer at an environment in the past where developers spent 6 months implementing features in isolation of each other, and then only integrating right before the project deadline. As a result, they would run into enormous integration issues and spend 3 additional months fixing all of them before finally delivering.

Now, developers who integrate at the end of the iteration often end up with a similar result. They miss the deadline sometimes by a day or more, and end up with issues bleeding into the next iteration (e.g. missing features due to bad merge).

When I encounter such environments, and hear that developers branch out at the beginning of every iteration before developing their own features, I shudder and point out that they are not following the Agile practice of Continuous Integration. They immediately shoot back saying something like "We have cruise control setup" or "We do not have the resources to setup a CI server", which only reveals ignorance about what Continuous Integration is really about. What I was actually saying is they are not integrating continuously into one common branch, and thus not resolving integration conflicts on an hourly or daily basis, yet letting them accumulate till the end of the iteration causing an integration snowball effect.

It is an unfortunate matter of human nature to be lazy at acquiring knowledge. You always want the least amount of learning to get you to where you want to go, so often people fail to dig deeper than what they hear and miss out on the deepest essence of what they are learning. For example, a lot of developers learning MVC from frameworks like Struts or earlier editions of Rails know just enough of MVC to get by, but never spent time digging into the true essence of MVC from Smalltalk Applications (or desktop development in general), and thus fail to apply it correctly. You end up with bloated controllers, instead of splitting most of the non-control behavior into Models. In the same token, a lot of developers who hear of Continuous Integration from the marketing lingo of CI servers think that is what Continuous Integration is all about.

Here is how Martin Fowler describes Continuous Integration:
Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly.


Notice how the primary emphasis is on members integrating their work frequently; at least daily if not multiple times a day. Also, see how having the automated build is secondary and only there to support the primary goal. So, when developers work in their own branches and do not integrate till the end of their iteration, they are not fulfilling the primary goal of resolving conflicts often before they get big and hard to resolve, and having a CI server does not make them a team that is properly doing Continuous Integration. While a CI server certainly helps them when they integrate at the end of the iteration, they still have to deal with bigger integration issues than if they were integrating daily if not hourly.

Now, working in branches certainly has its place. It is useful when doing a spike, building an experimental feature, performing big architectural changes, or even working on a separate release all together that would not go out till a few months later. Of course, in the case of a separate release, the code would probably not get merged back into master and can be thought of as a separate project (even if it branched off the original project's code base). And, in the case of big architectural changes, it is preferred if possible to have them done in small slices within iterations, and only relying on a branch as a last resort.

Local branches in source code control systems like Git and Mercurial have their place too. You can perform work in a local branch every day if you like as long as you integrate it back to the main branch at the end of the day or every few hours. Used that way, it would still be in line with the practice of Continuous Integration.

Takeaway?

Integrate early and often on the same branch (daily/hourly) and you will leverage the benefits of Continuous Integration on your Agile project by delivering more on time and avoiding big merge/conflict issues.

1 comment:

David Carver said...

For git, it is import for local branches, to synch up with the central repository if there is one on a daily basis, to bring down changes that may be there, and then merge them into your current topic branch.

Also, I highly recommend to developers using git to setup their own Local Hudson system, monitoring their local git repository. Another things developers are horrible at doing is making sure they run their unit tests after they commit.