Git Branches Demystified

admin

by
25th February 2016

git

One of the biggest obstacles towards mastery of git (if such a thing is possible) is understanding how branches work. Perhaps your everyday use of this particular version-control system is limited to the basic commands (add, commit, checkout, push, pull) which are enough to get by and get your work done. After all, your only concern is to be able to push out code and to deliver it to the customer, right?

In small teams, this workflow may be sufficient and maybe even great for doing this. However, as the team begins to grow, the cruft begins to accumulate and some kind of “regular cleaning” is necessary to keep everyone on the team sane. This is done by means of “advanced” git commands, such as cherry-pick, rebase, bisect, etc.

But in order to use these commands properly, a strong understanding of branches and how they work is needed. You can’t use these commands blindly, like the basic ones: you must know where you are and where you want to go, and branches are the signposts of the road that is the code repository.

Branches under the hood

The name “branch” suggests something completely different from what they actually are. One may think of the branches of a tree, extending themselves towards the sky as they grow, and therefore deduce that branches are just a series of commits linked together by time.

Git Branches Demystified 2

However, the actual branches are simply references to a certain commit in the entire repository. You can construct a “path” of commits by simply adding commits on top of the current one (which means, the one the branch points to). Also, these references can be moved across the entire repository as you wish (by using the git reset command) and therefore are not bound to a series of commits for the entirety of its lifetime.

Git Branches Demystified 3

Here, commits are in yellow and branches are in green. In this example we have two branches: master and testing (the latter being the current branch). If we were to run git reset master, the repository would look like this:

Git Branches Demystified 4

One word before moving on: the git reset command moves the reference of the branch you are on to the commit referenced by the destination branch.

Now testing and master point to the same commit. The original reference of testing is lost, since there are no more branches pointing to it, and it’s only a matter of time before git’s garbage collector figures out there are no more references to those commits and deletes them:

Git Branches Demystified (1)

The only purpose of branches is to provide an “alias” to a certain commit. In fact, we actually don’t need branches at all: we could work our way through the repository by using commit hashes instead of branches! The previous command (git reset –hard master) could be written also as git reset –hard 129c23.. . Of course, it would be extremely painful to navigate the repository this way, hence the existence of branches.

The HEAD branch

There is a special branch we haven’t talked about: the HEAD branch. This branch references the current commit, this is, the commit we are working on and which determines the files in our work area. This is the default branch for the majority of commands that accept branches as inputs. When we talk about “moving to the X branch”, we mean “making HEAD point to the same commit as X”. The HEAD branch is mostly used for relative movement across commits. The following example illustrates this:

Git Branches Demystified (1)

Here, HEAD references the same commit as master. The caret (the ^ symbol) after the branch name indicates “backwards” movement, and the number indicates the number of commits. For example, HEAD^2 means “two commits before the HEAD branch”. This notation can be used with all branches, not just HEAD: master^2 does the same.

Given its flexibility and looseness, the name “branch” seems inappropriate for this concept. When using only the basic commands, however, it makes perfect sense, since from this point of view branches and commits are strongly linked and it’s logical to relate them to a tree branch. When doing things such as rebasing or cherry-picking, however, this kind of reasoning conflicts with the thought process involved in performing these tasks (if branches are fixed, how come we can move them around and change their history?).

I hope this article is useful to clarify some of the most common doubts newcomers have about git. It’s quite hard to wrap your head around it the first time, but with patience and practice, you’ll find it easier and easier to work with it and to keep your code’s history clean for others to use. Thanks for reading!