SCM Migration Process
Contents |
Introduction
Openbravo has been using a centralized Source Code Management system (SCM; Subversion) since April 2006. Now (November 2008), Openbravo has decided to migrate to a distributed SCM (DSCM; Mercurial). This page holds information related to this decision and to the migration process.
Requirements
Development:
- Branching and merging should be painless.
- Reasonably fast, e.g. merges should take less than 2 minutes.
- Easy to use, yet powerful.
- Eclipse support.
- Support for hooks (e-mail, CIA).
- Integrated with the Openbravo Issue Tracker (mantis).
- Extensible.
- Share changes easily with other developers.
- Allow doing atomic changes without having Internet access.
Administration:
- Manage several code repositories in an organized way. Management of repositories should be easy.
- Control read/write access over the different repositories.
- Comprehensive URLs.
- Ability to encrypt sensitive data (credentials on pushes).
- Ability to have isolated (location) and at the same time related repositories (allowing merges).
- Easy to backup.
Others:
- Good documentation.
- Empower community contributions.
- Satisfy the need of all the Openbravo professional tools and partner needs.
- Multi-platform: native Linux, Windows, Mac OS X and *BSD support.
- Useful web interface.
- Available easy-to-use graphical tools.
- Free software with an active community.
- UTF-8 support.
Why Mercurial
This section does not include an exhaustive analysis of all the alternatives. It is just an enumeration of the main reasons to use Mercurial.
Development:
- Branching and merging is an everyday operation in Openbravo ERP development, and these are the natural operations in a distributed SCM like Mercurial.
- Mercurial is fast: diffs, merges, commits and reverts are all done locally. There's no server to ask for old revisions from a year ago. Maybe it's not as fast as git, but it's fast enough.
- It's simple to use. It does not have too many commands and most of them are like those in Subversion.
- Stable and actively developed Eclipse plugin, as well as Netbeans support.
- Complete support for handling repository events with hooks.
- Can be easily integrated with Mantis.
- Support for extensions. Some are included in the official distribution, while there are many others contributed by the community.
- Sharing changes with other developers is an easy task: through SSH or using the built-in web server.
- Offline work: there are situations where committing is time-consuming and expensive and it might work out better to do it in a single step.
Administration:
- Easy to administer and organize using HTTP or SSH. Just organize the repositories in the file system.
- hg-admin-tools and PAM for SSH and user-group scripts for HTTP.
- Simple CGI/FastCGI/WSGI scripts. So URLs can be fit to anything.
- SSH encrypts all the traffic. The push_ssl (default) option forces pushes over HTTPS, while allowing HTTP for the rest of operations (keeping speed in mind).
- Mercurial is distributed: developers can work in cloned repositories independent of the central repository and merges are still possible between them.
- Backups are straightforward: do regular cloning.
Others:
- Excellent documentation: the hg help <command> integrated help, the wiki and the hgbook.
- Easier contributions: until now, in order to have the benefits of version control with Openbravo in a seamless fashion, one must be a core developer (i.e. someone with commit privileges on the master copy). People who are not core developers but wish to work with Openbravo's revision tree (e.g. anyone writing a patch or creating a custom version) do not have a direct tool support for revisions. This can be exasperating to people who wish to contribute with a patch of any complexity and want a way to incrementally save their progress to make their development lives easier.
- Partners can have their own independent development branches. The SCM is no longer a bottleneck for the professional tools.
- Native Linux, Windows, Mac OSX and *BSD support (virtually every platform supporting Python).
- Integrated web interface.
- Several GUI tools like TortoiseHg, hgk.
- It's free software (GPL2) and there's an active community on the mailing lists and on IRC (#mercurial on irc.freenode.net).
- Full UTF-8 support.
As of version 1.1, Mercurial has some limitations:
- Cannot clone just a subdirectory of a repository (partial cloning).
- Mercurial tracks files, not directories. This has one minor consequence: it is not possible to have empty directories. However, empty directories are rarely useful, and there are simple workarounds that may be used to achieve an appropriate effect.
Authentication
There are basically two ways of publishing repositories in Mercurial: through HTTP and through SSH. Both options have their pros and cons. HTTP is easier to implement and is available everywhere (not filtered). SSH gives a more granular control.
The decision is to go for the simplest solution. So we decided to first try with HTTP and only try SSH if we found that HTTP was not enough. It seems that starting with Mercurial 1.1 HTTP is enough for our needs. However, we may need some simple scripts to manage access through groups.
Every project will have at most four different zones:
- devel: everyone can read (no authentication needed) but only developers can write (pushes forced to be done over HTTPS).
- stable: not everyone can read and only developers can write.
- release: everyone can read (no authentication needed). Only the Release Management Team can push here (forced over HTTPS).
- obn: not everyone can read and only the Release Management Team can push here (forced over HTTPS).
Structure of repositories
URL scheme: http://code.openbravo.com/[project]/[stage]/[subproject]
List of projects:
- erp: Openbravo ERP.
- pos: Openbravo POS.
- tools: Tools.
Common stages:
- devel: development.
- stable: stabilization branches.
- release: community releases.
- obn: all the releases.
Example Openbravo ERP URLs:
- http://code.openbravo.com/erp/devel/main
- http://code.openbravo.com/erp/devel/modularity
- http://code.openbravo.com/erp/stable/2.3x
- http://code.openbravo.com/erp/stable/2.40
- http://code.openbravo.com/erp/release/2.3x (tags: 2.35MP1, 2.35MP5)
- http://code.openbravo.com/erp/release/2.40 (tags: 2.40)
- http://code.openbravo.com/erp/obn/2.3x (tags: 2.35MP1, 2.35MP3, 2.35MP4, 2.35MP5, etc.)
- http://code.openbravo.com/erp/obn/2.40 (tags: 2.40, 2.40MP1)
Example Openbravo POS URLs:
- http://code.openbravo.com/pos/devel/main
- http://code.openbravo.com/pos/devel/fastfood
- http://code.openbravo.com/pos/stable/2.10
- http://code.openbravo.com/pos/stable/2.20
- http://code.openbravo.com/pos/release/2.10 (tags: 2.10, 2.10MP1, 2.10MP2)
- http://code.openbravo.com/pos/release/2.20 (tags: 2.20)
Release publishing repositories
The idea is to have some read-only repositories for the released versions. These repositories will have different access permissions depending on the type of release:
- [project]/release: this is for all public releases; that is, all major versions and community maintenance packs. Anyone can access these repositories without authentication.
- [project]/obn: This is for private maintenance packs. This will include all the major and minor versions. Only Openbravo Network customers can access these repositories (authentication needed).
When Openbravo used Subversion, the tags were not really tags. Some were modified. Others did not reflect a state or snapshot of the parent repository path. Because of this, we cannot use the natural way of managing tags in Mercurial, at least with the old ones.
Considerations and criteria:
- Easiness of automation and administration.
- Easiness of browsing through the web interface.
- Tags should be real tags.
- The last tag of every stable repository in Subversion will be the first tag in the equivalent stable repositories of Mercurial (the common tag).
- There are people with installations using tags from Subversion.
Chosen solution:
- Users will use Subversion until they update to the common tag. Afterwards they'll switch to Mercurial and continue updating there.
- All the tags previous to the common tag will be archived as tarballs.
- All the tags subsequent to the common tag will be used as normal Mercurial tags in the stable and main repositories.
Release stabilization branches
With Subversion, whenever we wanted to make a new release, we created a new branch to isolate release stabilization changes from further developments done in the original branch. For example, for the release of 2.50alpha-r4, we created a branch from trunk called branches/2.50alpha-r4. Developers could continue fixing bugs in trunk and should only fix bugs in the 2.50alpha-r4 branch if the bug is critical for the release. This meant that the final tag is a snapshot of the stabilization branch and not a snapshot of trunk. This was not a problem in Subversion, because Subversions tags are just new branches. However, a tag in Mercurial is just a label for a revision. And in the previous example, those labels should point to trunk.
Although in theory Mercurial allows doing repositories with just one revision, it's rather ugly and this is a good opportunity to change the release process to a cleaner approach. So from now on, instead of developing in the original branch and doing stabilization in a new one, the stabilization will be done in the original repository and new developments will be done in clones. After the tag has been created, the clones can be merged back.
Processes
From the book Distributed revision control with Mercurial:
"...it's often good practice to keep a 'pristine' copy of a remote repository around, which you can then make temporary clones of to create sandboxes for each task you want to work on. This lets you work on multiple tasks in parallel, each isolated from the others until it's complete and you're ready to integrate it back. Because local clones are so cheap, there's almost no overhead to cloning and destroying repositories whenever you want."
We recommend this process of using cheap local clones. The general process is:
- Have a pristine copy of the remote repository.
- To work on a repository, make a temporary local clone.
- Do as many atomic commits as needed in that temporary clone.
- Once finished, push directly to the master repository (the remote one).
A developer could have several temporary clones of the same repository to isolate different developments. For example, if a developer is fixing two bugs simultaneously but wants the commits and tests to be isolated, he would use two separate clones.
Developers
Fixing a bug
Example: the developer has to fix issue 6265 in main.
The developer starts by doing a local clone of the branch he wants to fix:
hg clone ~/src/erp/devel/main ~/src/erp/devel/issue6265_main
Then the developer can start fixing and testing the issue in that clone, doing as many commits as needed. Each commit should be an atomic piece.
cd ~/src/erp/devel/issue6265_main (edit fixme.java) hg ci -m "Fixes issue 6265."
When the developer is happy with the fix, the changesets can be pushed to Openbravo repository:
hg push https://code.openbravo.com/erp/devel/main
When finished, he can delete the clone and update his pristine clone:
rm -rf ~/src/erp/devel/issue6265_main cd ~/src/erp/devel/main hg pull -u
Backporting a bug fix
Example: the developer has to backport the fix of issue 6265 from devel/main to stable/2.40.
The developer has already fixed the bug in one branch and wants to repeat the fix in another branch. First he clones the branch he wants to fix:
hg clone ~/src/erp/stable/2.40 ~/src/erp/stable/issue6265_2.40
Then he can use transplant to apply the same fix in this branch (4544 is the revision of the commit done to fix this bug in main):
hg transplant -s ~/src/erp/devel/main 4544
After testing the fix, he can push the update to Openbravo repository and delete the temporary clone:
hg push https://code.openbravo.com/erp/devel/main rm -rf ~/src/erp/stable/issue6265_2.40
The transplant extension is distributed along with Mercurial. Another possibility to backport changes is to use hg export and hg import.
Developing a new feature
New features can be developed in local repositories. But if other core developers or community members are also contributing to this feature, you want a public repository to share your developments.
Example: the developer wants to start a community project to develop a new feature based on main.
- Create a shared repository based on main.
- Create a local clone as a reference:
hg clone https://code.openbravo.com/erp/devel/modularity ~/src/erp/devel/modularity
- Create a local temporary clone:
hg clone ~/src/erp/devel/modularity ~/src/modularity
- Edit code, committing in atomic pieces. Push to the shared repository to make the changes available to developers and the community.
- Once the feature is developed, merge the feature branch into main.
- The developer already has local copies of both repositories in ~/src/erp/devel/main ~/src/erp/devel/modularity. So we create a new clone were the merge will be done:
hg clone ~/src/erp/devel/main ~/src/erp/devel/merge_modularity
- Then, we pull changes from modularity branch:
cd ~/src/erp/devel/merge_modularity hg pull ~/src/erp/devel/modularity hg merge (conflict resolution) hg ci -m "Merged modularity."
Instead of pull+merge, the fetch extension could be used.
Once the merge is done, the changesets can be pushed into the centralized main repository and the merge repository can be deleted:
hg push https://code.openbravo.com/erp/devel/main rm -rf ~/src/erp/devel/merge_modularity
Finally, the modularity repository can be deleted.
New repository for new feature
Sharing changes with other developers is an easy task with SSH or using the built-in web server. However, there are situations where developers want to share the work with other developers or with the community in a centralized manner.
There is no built-in support in Mercurial for server-side copy of a branch (i.e. remote to remote cloning). However, an interactive script could be added to the server to provide this feature. An example session could look like this:
$ ssh $USERNAME@code.openbravo.com 'create erp feature-1' Clone another ERP branch? Answering no will create an empty repository [Y/N]: Y Name of the branch to clone: erp/devel/main ... Your new branch is at http://code.openbravo.com/erp/devel/feature-1
Core developers are allowed to create remote clones of the main repository.
An alternative to SSH is to create our own web interface to manage these requests.
New developer & access controls
Every repository will have one or more developers in change of them (manager). The Project Managers decide who can pull/push and who cannot.
To add a new developer, an interactive script could be added to the server to provide this feature. An example session could look like this:
$ ssh $USERNAME@code.openbravo.com 'addhguser' Create a new account? [Y/N]: Y Developer user name: johndoe Developer e-mail: john@doe.com ... New user created. An e-mail has been sent to john@doe.com with the developer credentials.
Equally, to delete a developer account:
$ ssh $USERNAME@code.openbravo.com 'delhguser' Delete an existing account? [Y/N]: Y Developer user name: johndoe This is permanent. Are you absolutely sure? [Y/N]: Y ... User johndoe deleted.
Finally, the developer in charge of a repository is capable of managing the access controls. An example session could look like this:
$ ssh $USERNAME@code.openbravo.com 'hgacl' Manage access controls? [Y/N]: Y You are the manager of the following repositories: * erp/devel/main * erp/devel/localization * erp/devel/my-feature-1 Select a repository: erp/devel/my-feature-1 Developer user name (use 'all' for anonymous): johndoe Current permissions of johndoe in erp/devel/my-feature-1: +pull -push Select action: +pull ... Permissions modified for user johndoe in erp/devel/my-feature-1: +pull +push
Release Management
Creating new stable repository
To be done only by Release Management. This could be done directly on the server. Or a ssh script/web interface could be created.
Releasing a new version
Example: Release Management wants to release 2.50alpha-r5.
A release is based on a tag. A tag is just a name for a specific revision. The process to make the release is:
- Lock branch. Commits are not accepted during stabilization phase. This is to avoid new instabilities.
- Stabilize. Changes to be done for stabilization should be done in the branch through pulls. Pushes are not allowed.
- Create tag. When the branch is stable, the tag can be created:
hg tag 2.50alpha-r5
- Unlock branch. Development can continue as normally. Pushes are accepted again.
Deleting a repository
To be done only by Release Management. This could be done directly on the server. Or a ssh script/web interface could be created. Repository Managers will notify Release Management when a repository is no longer useful.
Server infrastructure
Until now, all the developments were done in a single server. And all the developments done related to projects were done in the same repository. For example, the main Openbravo ERP development was done in the openbravo repository, in the trunk path. A trunk feature branch as done in openbravo/branches/feature-1, and also a customer branch was locate in openbravo/branches/cus_name. Given the centralized nature of Subversion, we were forced to use this approach.
Now, with Mercurial we want to split this into different servers, depending on the purpose. This means that all the developments done by core developers of Openbravo ERP and Openbravo POS will be done in one server. And the repositories used by the Openbravo Global Partner Services will be located in a different server.
Server requirements:
- A 10/100 Mbps network connection.
- Most developers work on Europe. So ideally located there (latency).
- Proven good connectivity with the Openbravo development HQ in Pamplona.
- Provider allowing custom Linux installations.
- Virtualized (Xen) for easy managing, backup and restores.
- I/O Performance: High.
- Software: Linux, Apache >=2.2.9, Python >=2.5, Mercurial >=1.1.
The number one candidate is Amazon EC2.
Migration
Migration tool
The migration tool should satisfy the following requirements:
- Allow migrating an entire repository.
- Allow migrating till a specific revision.
- Allow migrating starting from a specific revision.
- Migrate the entire untouched history.
- Allow mapping user names to real names.
- Allow saving a revision - changeset equivalent table in a text file.
- Manage tags and branches.
- Be reasonably fast (not more than 1 day for a 5GB repository).
Viable tools:
- Convert Extension: included in default Mercurial. It uses the Subversion API and satisfies all the requirements.
- hgsvn: does not satisfy 2, 3, 5, 8.
- svn2hg: not maintained for almost 2 years. Lack of documentation.
So the Convert Extension will be used to do the migration.
This is the command used to migrate trunk:
hg convert --config convert.svn.startrev=1063 file:///mnt/svn-repos/openbravo/trunk /mnt/trunk_migrated /mnt/trunk_migrated.revmap
To convert a tag without preserving the branch name:
hg convert --config convert.hg.usebranchnames=False file:///mnt/svn-repos/openbravo/tags/r2.35mp5 /mnt/hg-repos/erp/release/2.35mp5
It takes 45min. The maintenance branches, DBSourceManager and Openbravo POS take around 15 minutes.
What
What will be migrated:
- Openbravo ERP:
- trunk: at the beginning all the commits were automatic, and a daily database binary dump was uploaded to the repository. It does not make sense to migrate this. Because of this, the migration will start at revision 1063 and finish on HEAD.
- branches: all the maintenance branches will be migrated with their entire history. All the active community branches will be migrated with their entire history. The rest of the branches won't be migrated.
- tags: all the release tags will be migrated to /release.
- localization:
- Openbravo POS:
- trunk:
- localiation:
- branches:
- tags:
- localization:
- Customer branches:
- From now on, customer code is no longer going to be in the same server as Openbravo ERP core development. Consulting will have its own server and will manage autonomously their customers' code. We will migrate all customer branches to this server.
- BI-solutions(?)
- DBSourceManager(?)
- Setup tool
Milestones and tentative schedule
ID | Milestone | Owner | When | Status |
1 | Prepare the test Mercurial machine | Jpabloae | 17/12/2008 | Done |
2 | Test the processes with volunteered developers | Jpabloae | 19/12/2008 | Done |
3 | Port the current Subversion-Mantis integration to Mercurial | Jaimetorre | 20/01/2009 | Done |
4 | Test and update development documentation using Mercurial Eclipse | Jaimetorre | 10/02/2009 | Done |
5 | Write a Mercurial at Openbravo guide | Jpabloae | 10/02/2009 | Done |
6 | Set up final server in EU | Jpabloae | 12/02/2009 | Done |
7 | Migrate ERP's trunk and stable branches | Jaimetorre | 24/02/2009 | Done |
8 | Set up e-mail notification hooks in mailing lists and in CIA | Jpabloae | 24/02/2009 | Done |
9 | Announce migration to Mercurial | Jaimetorre | 25/02/2009 | Done |
10 | Do internal training sessions | Jpabloae | 25/02/2009 | Done |
11 | Set up server remote backups | Jpabloae | 25/02/2009 | Done |
12 | Migrate all GPS repositories to Mercurial | Jpabloae | undefined | Not started |
Open issues
- Which collaboration model are we going to use?
- Pull only (Linux kernel model).
- Shared push.
- Which subdomain are we going to use?
- dev.openbravo.com
- code.openbravo.com
Terminology
- SCM: Source Code Management. More
- DSCM: Distributed Source Code Management. More
- Repository: A remote copy of a source tree with revision history. More
- Working Copy: Also known as working directory. It's the top-level directory in a repository, in which the plain versions of files are available to read, edit and build. More
- Changeset: It's a collection of all the changes that lead to a new revision of the repository. More
- clone: a complete copy of a repository. More
- pull: a pull propagates changesets from a "remote" repository (the source) into a local repository (the destination). More
- update: it updates the working directory from the repository. More
- commit: commit changes to the given files into the repository. More
- push: to send all differing revisions to another repository. More
- tag: to name a particular revision. More
- branch: a diverging point in time in the revision history. More
- Development branch: a branch of the main repository, usually used to add a new feature to the product.
- Stable branch: a stabilization or maintenance branch of a major release.
- Release stabilization branch: a branch used by Openbravo in Subversion to stabilize an upcoming release ant to avoid freezing trunk.
- trunk: name used in Subversion to indicate the branch where the main developer work is done.
- main: name used by Openbravo in Mercurial to describe the repository where the main developer work is done.
References
- Mercurial official wiki.
- Distributed revision control with Mercurial (hgbook).
- Documentation on Source Code Management for OpenSolaris.
- DVCS PEP