Dual Syncing ClearCase and Git

22 October 2013

Summary

Importing the history from ClearCase into Git is interesting, but how can you actually transition a large development organization? Creating a dual sync between ClearCase and Git will allow individuals and teams to cut over on their own schedule, providing a smooth transition period.

Background

Just like with importing history, there are commercial solutions for dual syncing. However, they are inadequate if you want to create usable history from your sync.

Design Considerations

In the case of conflicts, fail loudly and resolve manually.
Aggressively detect any potential conflicts; never guess.
Sync on branch-basis, not repository basis. This greatly narrows the scope and risk of the sync.
Preserve history in Git beautifully. The git history should appear native.
ClearCase labels not supported - a dual synced branch must be tagged in Git.

Initially, we designed our dual sync to be a temporary solution, but it ended up working so well we left it on for the duration of a few branches. An argument could be made to “rip the band-aid off quickly”, but the reality in a corporate development environment is that management is not willing to take any risk to schedules that could be created by a source control cutover.

Architecture

We developed two independent pieces of software. They are necessarily aware of each other, but only communicate within the Git/ClearCase environments.

Clearcase to Git

ClearCase to Git relies heavily on our git import script. The import script is designed to work on multiple branches and process on a label-by-label basis. A few features are necessary to use it in realtime - but we get most of the code that generates beautiful Git commits for free.

First, we can not rely on labels. Instead, we need a dynamic view that points to a live branch. We can use the same algorithm to detect changes (rsync + git status), and then we can reuse the commit generator.

However, because we don’t use ClearCase atomic commits, we have to concern ourselves with importing ClearCase changes that aren’t finished. For example, a ClearCase commit can take 30 seconds per file. A large commit can take an hour. This should be imported into git as a single, atomic commit. The solution is to wait for a given ClearCase commit to become “seasoned” before importing into Git. In reality what this means is discarding commits that are too young, and waiting until they are, for example, 15 minutes old before bringing them into Git.

The other change that the import script needs is conflict detection. When doing an import, it does not matter what is on the destination - we can freely make the destination (Git) look like the source (ClearCase), because ClearCase is always right. However, in dual sync, both sides must be respected.

Instead of performing straight file copies into Git, we can create patch files from ClearCase changes and then apply the patches into Git. If the patches apply cleanly, we can assume there are no conflicts. The patch is generated by using cleartool get against the predecessor version of a file element and the current version, copying them into a temporary location, and then running diff to generate a patch.

If patch exits cleanly, we’re good. If it does not, we need conflict resolution (see below).

Lastly, we append a token string to all of the Clearcase imported Git commits so that the Git to ClearCase script knows to exclude them. Likewise, git2cc will add a token to ClearCase versions it creates, and the cc2git script will ignore those changes.

Git to ClearCase

Going from Git to ClearCase is slightly easier than cc2git. Determining what to import is trivial - simply look for Git commits that do not have the “Imported from ClearCase” token, and have not already been imported.

Importing the commits is also done using diff/patch to detect conflicts. Besides that, it is just a matter of automating the ClearCase commit mechanics and adding a token so that cc2git will ignore the changes.

We don’t particularly care about other metadata in ClearCase. After all, the point of this exercise is to migrate off of ClearCase. It is counterproductive to make the ClearCase data too pretty.

Binary Files

For file types that diff/patch does not support, we simply copy over the change. It could be made more robust by validating that the previous version of the binary files matches the current version on the destination, but we determined that in our environment it was not necessary. This is a known race condition which we chose not to handle.

Conflict Detection and Resolution

Since all changes are imported using diff/patch, if a patch does not apply cleanly, there is a conflict.

If we hit a conflict from either side (cc2git or git2cc), the action is the same: Send an email notice and exit. We determined that conflicts would be rare and that it is OK to punt resolution to a manual process. Our developers have been warned to coordinate changes with their colleagues to avoid conflicts, and they are a very rare problem in reality.

However, we have to be prepared to resolve a conflict. To avoid infinite loops, we created a secret token that can be used to indicate a manual conflict resolution has occurred. If this token is detected by either side, it should import the change using cp instead of diff/patch.

Summary

Dual syncing between ClearCase and Git is not easy. It required a bit of custom code and some planning, but the result is a pain-free Git migration.