Thursday 21 April 2011

Launchpad and stacked branches

As I'm sure most of you are aware, Launchpad hosts Bazaar branches. One early design decision that we had on Launchpad was that branches should be able to be renamed, moved between people and projects without limitation. This is one reason why each branch that was pushed to Launchpad had a complete history. We wanted to make sure that there weren't any problems where one person was blocking another pushing branches, or that people weren't able to get at revisions that they shouldn't be able to.

The Bazaar library, bzrlib, gives you awesome power to manipulate the internals giving you access to the repository of revisions through the branch. This can be a blessing and a curse, as if you have private revisions, then they can't be in a public repository.

Having a complete copy of the data for every branch became a severe limitation, especially for larger projects, of which Launchpad itself is one. A solution to this was a change in Bazaar itself that allowed a fallback repository which contained some of the revisions. This is what we call stacked branches. The repository for the branch on Launchpad has a fallback to another repository, which is linked to a different Launchpad branch. We ideally wanted all of this to be entirely transparent to the users of Launchpad. What it means is that when you are pushing a new branch to Launchpad, the bzr client asks for a stacked location. If there is a development focus branch specified for the project, this is then offered back to the client. The new branch then only adds revisions to its repository that don't exist in the development focus branch's repository. This makes for faster pushes, and smaller server side repositories.

The problem though was what do we specify the stacked on location to be? When we created the feature, we used absolute paths from the transport root. What the mean was that we stored the path aspect of the branch. For example, lp:wikkid gets translated to bzr+ssh://bazaar.launchpad.net/~wikkid/wikkid/trunk or http://bazaar.launchpad.net/~wikkid/wikkid/trunk depending on whether the bzr client knows your Launchpad id. The absolute path stored would be /~wikkid/wikkid/trunk. This information was then stored in the branch data on the file system.

The problem however was that the web interface allows you to rename branches. The actual branch itself on disk is referred to using a database id, which is hidden from the user using a virtual file system which has rewrite rules for http and at the bazaar transport level. However since the stacked on location refers to a full branch path, changing any part of that, whether it is the branch owner, branch name, or the project or package that the branch is for, would cause any branches stacked on that changed branch to break, bug 377519.

In order to fix this we had to change the location that the branch is stacked on to be independent of the branch path. The best solution here is to use the database id. I really didn't want to expose the user to this opaque id, but one opaque id is as good as another. Now when pushing branches to Launchpad, when it is creating a stacked branch you'll see a message like:

Created new stacked branch referring to /+branch-id/317141.

Existing branches still have their old branch paths saved for now. We'll run a migration script early next week to fix all these up, and hopefully we'll have seen the last of this bug.