Two or three days before vacation I listened to the (then) most recent episode of the FLOSS Weekly podcast featuring Git maintainer Junio Hamano. It was an interesting enough conversation that I decided to take a swing at using Git on top of the OSAF Subversion repository. After about a week of serious use I’m at a point where I have an idea of whether or not Git is going to be useful to me going forward, and at least a glimmer of an idea of how Git might fit in to the larger picture at OSAF. In this and several subsequent posts I’ll try to lay that out.
Why Git?
Or, maybe, what is Git? In the words of its developers:
Git is a popular version control system designed to handle very large projects with speed and efficiency.Git, like other similar version control systems, has been receiving a great deal of attention lately in open source circles. If my own experience is any indication I’d venture to guess this is because Subversion and its predecessor CVS are simply a poor fit for projects with a highly distributed development model. At this point I’m tempted to go out on a limb and say that Subversion and CVS are a poor fit for almost any project, but that ground has been covered.
While there are many differences between Subversion and Git, the one most often cited as the largest is the “centralized vs. distributed” philosophy. Projects managed with Subversion usually have one central repository that is managed like a castle. The list of people allowed to enter the castle is generally short and politically maintained. While those lucky few are generally allowed to bring anything they’d like into the castle, the central chambers (called the “trunk”) are closely watched and usually pristine.
Distributed version control changes the rule. Instead of a castle there exist a community of small houses. Usually there is one house in the center of town that everyone else bases their house upon which also incorporates the best parts of all the other houses periodically. While access to this central house is still usually a political matter, there is technically very little stopping any other house in town from becoming the center of attention.
Interestingly, I cared very little about this particular aspect of Git coming in. As a member of the elite “commit club” for Chandler Server I’m already at the center of town, and it’s not particularly in my interest for that center to move.
Instead, I decided at some point that Subversion gets in my way and makes me work less efficiently. While Subversion is marvelously complex and good at maintaining a code history for a large project there is a staggering amount of information that it, with the help of good developers everywhere, completely ignores. This information loss is due to the nature of the stately castle that is centralized version control, and looks like this:
Developers spend a lot of time writing computer code. Frequently this code is in a state that would not be acceptable to release to the public. Maybe a component is half implemented and will be done in an hour or two, or perhaps a developer simply hasn’t run tests to ensure his code works and has not introduced painful behavioral regressions. Whatever the case, allowing this code to enter the inner sanctum of Subversion trunk can invite a host of bad feelings from other contributors. A broken trunk can, in many cases, lead to a total shutdown of developer productivity, and is something to be avoided at all costs. As a result, developers using Subversion can often go long stretches without interacting with the version control system. It is the time between these careful conversations between man and machine that untold quantities of information are lost. To understand that, let’s see a couple examples.
1) Lost Threads
This one has happened to me a couple times. I’ll be working along on a feature or a bug and realize I have a fantastic solution to a tricky problem that will require a little refactoring. Since my code is nowhere near a mature state and totally untested, I won’t even think about checking it in to Subversion. Thirty minutes later I’ll have my intended changes just about finished, only to realize that the entire idea was either fundamentally flawed or just not a cleaner solution than the previous code. With Git I would: a) make a commit as soon as I have my “fantastic” idea, b) refactor away, c) make a commit after I realize it is a bad idea and finally d) roll my working branch back to its pre-refactored self, being careful to keep the refactoring commit available in the branch history in case I want it in the future.
2) Ignorance Sucks
This one probably never would have occurred to me before I started working with Git. Here’s how it looks: I’m working away at a bug I’m interested in, a reasonably complex feature touching both network communication and UI code. All of a sudden, out of nowhere, an absolutely critical, must-be-done-now kind of bug rolls down the pike. Work on my previous bug must stop immediately, and work on the new bug must start as soon as possible. Trouble is, this new bug touches code I’ve been working on. Do I a) throw away the stuff I’m working on or b) go through the hassle of getting a new copy of the appropriate branch across the network and work there. With Git, this never happens: Instead of doing my work on the actual “trunk” or point release branch that I plan to push to the wider world (that is, push back to a Subversion repository via a special gateway utility) all of the development for bug X happens on a branch called bugX. Branches in Git are incredibly lightweight and easy to manage, and the merging algorithm is supposed to be top notch (I haven’t kicked the tires hard yet). When that life threatening bug comes down the pike I a) commit, b) switch to the “public” branch I need to base the bug fix on, c) make a new branch for the new bug, d) fix the bug and finally e) merge the new branch back to the main branch and push these changes public. The problem here is that Subversion has no understanding of the differences between these “topics” of development. By making branches so easy to use that moving between them is almost trivial, Git turns branching into a mechanism for doing just that.
The underlying problem in both these cases is that there are many situations that come up frequently in software development that Subversion simply has no capacity to understand, or requires consideration of social or political factors to handle appropriately. By providing natural solutions to these problems, Git suggests it may be a better mousetrap.
In my next Git post I’m going to run through setting up a Git-SVN gateway and some examples of basic Git usage as it applies to the Chandler Server project.