In my current project, I’ve been trying to help my team understand Git. Recently, we’ve been having problems with broken commits, losses of history, and errant merges. Thus, I figured that it was time to switch to a rebase workflow, and help them out by writing a bit of a tutorial on Git rebase, just to make their lives easier.1
[note: there's a newer post that describes how to use this rebase workflow with branches and pull-requests in a way that Github themselves use to maintain an always deployable master branch. Check it out.]
First thing’s first: Let’s review a bit about branches.
When working with source control, you can use branches, these are, in effect, copies of the source code that you can work on. Branches are different copies/versions of the codebase that are used for different purposes. For instance, you can have one version of the code that everyone is using in production (we’ll call this the master branch), and you can create other branches to develop and test features without borking your production branch.
Now, there are essentially two types of branches you can create, a public branch, and a private branch.2 A public branch is a branch that everyone pulls from. In other words, it is a branch that everyone has. In order to create a public branch, you would push changes to the remote repositories (either a centralized remote, like Github, or to individual people’s repos if that’s possible).
$ git clone git://github.com/mettadore/tutorials.git $ cd tutorials $ git branch * master $ git checkout -b public Switched to a new branch 'public' $ echo "Some random change"> README $ git commit -a -m "Update readme" $ git push origin public $ git branch master * public $
In this series of commands, we clone and enter the tutorials repository, look to see that we are on the master branch, create a new branch called public, make a change to the README file, commit the change, push the public branch to the remote repository, and then see that we are still on the public branch.
The important point of this is the second to last command: git push. This tells Git that we want to let anyone pull from this branch.
By contrast, you can do all of the above, and omit the git push command, and you create a private branch. A private branch is a branch that only resides in your local repository. No-one else knows about it, or can pull from it.
Private branches are for you. Most of the time, you’re likely just hacking around and trying shit out. It’s code that’s untested, unverified, maybe prototypical. You don’t need to push this code to the public because no-one cares about it.
Public branches are for your team. Often, more than one people work on something that needs to stay out of the production branch, like a new feature.
Now, when you have a branch, you eventually have to bring the changes from that branch back into the production branch. The most common way to do this is by merging. When you merge two branches, you are merging just the code. In effect, you are saying “take this snapshot of the branch and combine it with this other branch.”
For instance, our team has two public branches: master and public. Let’s say I only created public to build some new feature, so I make some changes to the public branch to prototype it, test it, etc. While I am working on that public branch, my team continues working on the master branch, fixing bugs, doing scheduled updates, etc.
Now, I’m done with my new feature, and want to make it part of the production codebase. So I merge it. Here’s an example of what my log would look like:
This log shows all the changes to our master branch, including the merge. But what does that merge really tell us? Almost nothing, actually.
This is one, though not the only, problem with merging: You pull in the latest version of the code, but not the process you went through to get there. The history for the branch that you are pulling into your branch is lost.
To understand what’s going on, it’s helpful to understand a wee bit more about a branch.
Let’s say you have a production branch called master. You’re on commit E on master, and decide to make a branch called topic. Now, Git knows that the topic branch is related master/HEAD- i.e. it is related to the most recent commit of the master branch, which is commit E.
Everything is fine until someone makes a change to master and commits it. Now, master is on commit G. This means that the topic branch is no longer related to the master/HEAD (the latest commit) of the master branch because master/HEAD now points to commit G, which topic doesn’t even know exists.
Our branches are now strangers, they have grown apart, maybe never to return (sometimes branches just go their separate ways– become new software).
So, since they are no longer related, the changes to one are unrelated to the changes to the other. Thus, when you bring them back together by merging topic into master, you’re actually saying “I know that the individual changes in these two branches don’t relate to each other at all, so go ahead and throw the changes from topic away. All I care about is shoving the latest version of topic into the latest version of master.”
This explains why you lose the commit history for topic when you merge, and also why you need to pull the master branch immediately prior to merging– because you need the absolute latest version of master.
It also explains why things get difficult when more than one person are working on a branch, because what if someone else pulls the master branch for a merge just as you are pulling the master branch for a merge? This is one- but not the only time- that hell breaks loose.
We can avoid these problems by using rebase.
In our previous example, we had topic and master. Our topic branch was related to commit E on master, and when another developer changed master our relationship was broken- forcing us to merge to get it back in. Also, if we pulled the master, merged, and then tried to push, we might run into the problem of having to pull and merge again if someone else pushed first.
All of this because our branch relates to an older commit.
What if we could just update the commit that our branch relates to.
This is exactly what rebase does. When you have a branch, and use git rebase, you are saying “Hey, this branch was related to commit E, but they’ve done a bunch of work and are now on commit G, so go ahead and update this branch so it relates to commit G instead.”
You can use git rebase to re-write the base of topic so that it is connected to HEAD.
Let’s look at our tutorials repository for an example. Here’s our master branch:
and here is our new topic branch:
The topic branch is related to the master branch at the commit with the message “change README,” that is, they are the same up to that point.
In this picture, the commit with the message “change README” is E. After this point, the topic branch had a change to file_a and the addition of file_c. Simultaneously, the master branch had a change to file_b and more changes to the README file.
At this point, we can do the following:
$ git checkout topic $ git rebase master First, rewinding head to replay your work on top of it... Applying: Change file_a Applying: Add file_c $
Let’s look at our topic branch log now, to see what happened:
You can see that, whereas before the last common commit that topic and master shared was “change README,” now the last commit that they share is “Update readme again.” In other words, they share the last commit of master.
Seen another way:
The great thing about this is that we haven’t changed the master branch at all. we’ve merely brought our local branch up to date with it.
The best thing about this is that, since our changes are now related to the latest version of the master branch, we can fast-forward the master branch:
$ git checkout master $ git rebase topic First, rewinding head to replay your work on top of it... Fast-forwarded master to topic. $
This command lays the latest changes to topic right on top of the master branch, and preserves all of your commit history- laying them right on the end of the master branch’s commit history.
Another option is to do this:
$ git checkout master $ git merge --squash topic
This command will result in a commit log like a normal merge- meaning that all of the individual commit messages from the topic branch will become one single “merge” message.
Yet another option is to do this:
$ git checkout master $ git rebase -i topic
which will create a single commit log entry, but preserve all of the commits from the topic branch into that entry. In otherwords, it condenses the entire history of the topic branch into one message. This is good if you want to keep the log but don’t care about keeping the individual file changes.
An excellent part of using a rebase workflow is that you can use a private branch to work on, without needing to push that branch to the public repos. This allows you to work and discard at will.
The common workflow is this
$ git checkout master -> do some work and commit lots $ git pull $ git merge do some more work
Here’s a better workflow using git:
$ git checkout -b topic -> Work a bit and commit changes $ git pull origin master $ git rebase master -> Work a bit and commit changes $ git pull origin master $ git rebase master -> Work a bit and commit changes -> When you're ready to make everything official: $ git checkout master $ git merge topic $ git push $ git branch -D topic <- or, as I do, keep using the local branch for more changes
What we do here is create a new branch and work on it while periodically rebasing it to master/HEAD, finally we merge the changes (laying our entire commit log onto of master) and delete the local branch. If we’d used a public branch, that would remain forever, cluttering up the repository system. With a private branch we get all the benefits of the branch and the commit logs, with none of the clutter.
This means that you have to periodically rebase your local private branch to keep it up to date, yes. And rebasing your local branch means that you may have to merge if you and someone else are working on the same files. But, the merge happens on your local private branch, not on the public production branch.
As a summary:
Furthermore, when you merge your changes, you have three choices:
Git rebase takes some getting used to, but the results are worth it. I highly recommend giving it a try.