Monday, June 1, 2015

Making Friends with Merge Commits

[NOTE: At the moment, all Git commit graphs on this page are drawn with gitgraph.js. This makes creating content a bit easier, but means that if JavaScript is off, you won’t see the graphs. I hope to fix this, since it goes against my philosophy of not breaking graceful degradation; however, I’m not sure what the best way to do so would be. Please leave suggestions in the comments if you have them. Thanks!]

Like many Rails developers, I’m a strong believer in effective use of version control. I switched from Subversion to Git about 2008, and am quite convinced that Git’s decentralized model with easy merges is light-years ahead of anything Subversion could ever have dreamt of. I’ve even written some tools to help my job (at the time) migrate a large complex repository from Subversion to Git and not lose branching structure.

When I come onboard for a new project, I make it a point to look at the structure of the Git repository. As you might expect, I’ve used Git on many projects in the last few years, and I’ve seen a number of different approaches to branching structure. Most of them work more or less well, but there’s one extremely popular structure that does not. Without exception, I know that if I see this branching structure in a repository, then it will be difficult or impossible to use Git effectively with that repository.

What is that branching structure, you may ask? Behold:

(Older commits are towards the bottom.)

So what’s wrong with a straight commit history?

Nothing—if there was actually only one line of development. For a simple project with only one developer, working on tasks on a completely serial basis, sometimes a straight commit history is fine.

But do you really believe that Anne created the administrative interface in one blinding stroke of inspiration, and that she didn’t commit anything till she was done? If she did that, then she’s not committing often enough. That’s what the commit history implies, but we know that that couldn’t have been what actually happened. And the point of a version control system is to store what actually happened—see Paul Stadig’s excellent article Thou Shalt Not Lie for more on this issue.

In this case, we can assume that each feature was created on its own branch, and then integrated with git merge --squash, which squashes all the changes into one commit for integration. (A different sort of straight history might also have been created by means of fast-forward merges; this is better, but not good enough, as we’ll see in the next section.) The advantage of git merge --squash, of course, is that each feature appears in the integration branch as one unit; the disadvantage is that it’s a monolithic code dump. If a later maintainer finds a questionable line of code in the administrative interface, and wants to see the context in which Anne wrote that line and figure out what she was trying to do, the maintainer can’t—the context is gone forever, because squashing is lossy and irreversible.

There’s a worse problem here too. git merge --squash creates a new commit with all the changes from the pre-squash commits, but Git doesn’t know that the new commit bears any relation to those squash commits. This breaks all sorts of useful Git history analysis tools such as git branch --contains and git bisect. In fact, it’s very similar to how Subversion does merges, but even less convenient: recent versions of Subversion at least retain a small amount of parentage information on merges, whereas Git really doesn’t for merge --squash.

The end result of this behavior, both in Subversion and Git, is that a branch is “dead” after a svn merge or git merge --squash. If Anne discovers, after git merge --squash, that one of her views has a typo in it, it is extremely inconvenient to correct it on the feature branch and merge it in, because Git has no record of the connection between master and the feature branch when merges are done this way.

Fast-forward merges: better than squashing, but not much

One way of keeping the relationship between branches is to use a fast-forward merge. This is the type of merge that Git does by default whenever it can: it simply fast-forwards the head of the older branch to the newer branch, without using a merge commit. So if Anne’s local tree looks like this:

then git merge will make master look like this:

This has some advantages over git merge --squash. Every commit that Anne made is present in the master branch, so we can examine where each line came from and hopefully see a meaningful commit message rather than a monolithic code dump. Furthermore, each commit in master maintains its identity from the feature branch (note the identical hashes), so Git can tell they’re related.

However, this technique also has a couple of huge disadvantages. First of all, it only works if master hasn’t moved on while Anne has been developing (or if Anne runs git rebase before merging, which has its own problems). More importantly, though, while the commit messages may be meaningful in Anne’s local context, something like “Improve HTML.” isn’t very meaningful as a commit message in the master branch by itself. In other words, while this type of merge preserves the context of each work-in-progress commit, it loses the context of the feature as a whole—that is, the context in which the commits were actually developed—in a way that is still lossy and irreversible. Also, the integration branch very quickly gets cluttered with a lot of small commits that aren’t really integration commits.

Fortunately, there’s a better way.

Merge commits: context with encapsulation

Let’s return to Anne’s local tree in the example above. In order to merge into master in an encapsulated way, yet keep the advantages of preserving the intermediate commits, Anne can use git merge --no-ff. This prevents Git from doing a fast-forward merge, and instead creates a merge commit (5555555 in this example)—that is, a commit with multiple parents:

This is clearly the best of both worlds. Master now contains every commit that Anne made on her local branch, with the original history information intact, but because they’re all encapsulated in a merge commit, we also know the context in which each commit was made, just by noting which branch it’s part of. In other words, this technique is lossless. Furthermore, it will work even if master diverges; all Anne has to do to stay current is merge master into her development branch.

I have sometimes heard the objection that this makes git log at the command line harder to use, because it’s harder to tell what goes with what. There are two easy ways to address this:

  1. Use git log --merges to filter non-merge commits from the output. On an integration branch, everything should be encapsulated in a merge commit, so this gets you equivalent output to git merge --squash, but without the loss of information entailed by squashing.
  2. Use git log --graph, which gives you an ASCII representation of the tree structure.
  3. Use a graphical tree visualizer such as gitk, GitX (other forks exist), or SourceTree; many others exist.

My preferred solution is usually 3.; command-line tools just aren’t very good at displaying arbitrary directed acyclic graphs. (The GitHub desktop application is also bad at displaying these structures, or was the last time I tried it; please use something else to look at your commit tree.)

Postscript, and a couple of requests

As I’m sure you can tell from this post, I believe strongly that smart use of merge commits is the key to maintaining a well-organized Git repository. I hope I’ve gone some way toward convincing you of that.

This is a complex topic, and I don’t claim to have touched on all the arguments in support of my position. If there’s something I should clarify, please let me know and I’ll update accordingly.

I should also note that I’ve seen a lot of dislike of Git merge commits among my colleagues, and I don’t understand that dislike at all. If you dislike merge commits, please leave a comment to tell me why, so that I can update this post to make it more relevant to you. Thanks!

Sunday, May 31, 2015

Welcome to Rails Freak!

Welcome to Rails Freak!

Welcome! I’m very pleased to be starting Rails Freak, a place for my occasional essays and rants regarding Rails development and other related issues.

Who am I?

My name is Marnen Laibow-Koser. I’ve been developing Web applications since 1999, and working with Rails almost continuously since just before version 1.0 was released in 2007; I used to be quite active on the Rails mailing list, and may be again someday. I know Rails well and love it, but I also try to bring the lessons learned in my pre-Rails days to my Rails development.

Why this blog?

Honestly, I’m terrible at maintaining blogs: I have relatively little spare time, and many projects to fill it with. But I hope I have some things of value to say about Rails development, and this seems like a good way to do so.

Why is it called Rails Freak?

The word “freak” has two meanings:

  1. Someone passionate about something; a fan.
  2. Someone out of the ordinary, unusual, perhaps even heretical.

I am a Rails freak in both senses.

  • I love Rails and am passionate about getting the most out of it and evangelizing it to others.
  • While I like the “Rails way” in general, there are many specific cases where I believe there are better ways to do things. I hope to use this blog to discuss how to improve on certain of the Rails norms without doing violence to the basic spirit of Rails and the wonderful things it makes possible.

Will this blog cover $TOPIC?

Is there something you’d like me to write about? Do you have an idea for a guest post? Leave a comment or e-mail me! While I’m not qualified to write about everything, I’ll take reader preferences very strongly into consideration as I create future posts.