Conversion of SVN repository to Git

MigrateMigration of a Subversion (SVN) repository to the Git distributed version control system.

In contrary to the CVS-to-Git migration, the guide does not make use of external scripts (like svn2git), as the Git built-in support of SVN is working reasonably well.

For the following steps, it is recommended to use the GitBash command line under Windows (to be able to utilize the emulated Linux commands).

Step 1: Extract author names and prepare the author transformation file

First of all, we are going to prepare the authors transformation file. Similarly to CVS, Subversion only stores a single user nick, whereas Git uses the full user name and the e-mail address.

Checkout the recent version of the repository you want to migrate (if you are migrating a SVN repository with multiple projects, checkout the repository root or at least a path which covers all the migrated projects):

$ svn co <the_repository_path_or_url> SvnMig
$ cd SvnMig

Next, call the following command in the SVN working directory root to extract all the user names (copy the entire multi-line command):

$ svn log -q |
awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' |
sort -u > ../authors-transform.txt

$ cd ..

The command will create the file "authors-transform.txt", which contains all the unique SVN user names in the following format:

username1 = username1 <username1>
username2 = username2 <username2>
...

Now the file needs to be modified to transform the SVN user names to the Git full user names + emails, such as:

username1 = First1 Last1 <user1@example.com>
username2 = First2 Last2 <user2@example.com>
...

At this point the SvnMig directory can be removed, it is not needed for the further steps.

Step 2: Setup the Git repository for the SVN import

Create an empty Git repository (make sure that you moved out of the SVN working copy before by "cd ..") – again, use a repository root path or a full path project, as in the Step 1:

$ mkdir SvnGit
$ cd SvnGit
$ git init
$ git svn init <the_repository_path_or_url> -s --prefix=svn-origin/ --no-metadata

The git svn init command creates an empty Git repository and sets up a SVN remote location in the repository configuration. It just does the setup of the SVN remote in the new Git repository, it doesn’t start the migration process yet.

By using the "-s" parameter, the SVN remote will have a standard layout, that means a single project repository with the standard "trunk", "branches" and "tags" substructure.

The "--prefix" argument is prefixed to the remote branch names, that means the remote SVN branches will be named "remote/svn-origin/trunk", "remote/svn-origin/branchXXX" etc. – use of this parameter is highly recommended (without it, the remote branches would just be named "remote/trunk", "remote/branchXXX" etc.).

The "--no-metadata" option tells the SVN import to not keep track of the original SVN references in the Git repository. That is applicable to one-time migrations (as you usually don’t need the SVN metadata then), but discouraged for incremental SVN imports (Git is actually able to cooperate with remote SVN repositories, and do regular pulls/pushes from/to it).

You might as well use the "-R <name>" option to give the SVN remote a different name than the default one.

The ".git/config" file of the new Git repository will then contain a section similar to this:

[svn-remote "svn"]
    noMetadata = 1
    url = <the_remote_url>
    fetch = <project_path>/<project_name>/trunk:refs/remotes/svn-origin/trunk
    branches = <project_path>/<project_name>/branches/*:refs/remotes/svn-origin/*
    tags = <project_path>/<project_name>/tags/*:refs/remotes/svn-origin/tags/*

The first string before the colon (e.g. "<project_path>/<project_name>/trunk") is the SVN path, the string after the colon ("refs/remotes/svn-origin/trunk") is the Git path. The asterisk sign is a wildcard character.

Step 3: Fetch the SVN repository

The next step is to process the actual SVN repository migration. This will be performed by just fetching the remote SVN repository. Note that we use the author transformation file here, to properly translate the SVN usernames to the Git user name format:

$ git svn fetch --fetch-all -A ../authors-transform.txt

Once the fetch is started, you can take a (long) coffee break, because this will take a while. Especially on large SVN repositories.

Step 4: Finalize the migrated repository

Now the SVN repository is migrated in the principle, but you might want to check, if the conversion was correct, and do some other adjustments. For example, remove empty commits.

Also, check the new repository and make sure, that all branches/tags/etc. were properly converted (some non-standard SVN repository layouts might cause trouble).

Before removing the SVN remote locations from the configuration file (for one time migrations), you’ll need to create local branches in the repository, and convert the SVN tags (which are basically branches) to the real Git tags. You might use these commands:

#!/bin/sh

# create local branches for each remote branch
for branch in $(git branch -r); do
    # remove "svn-origin/" from the local branch name
    # replace spaces by underscores
    git branch "$(echo "$branch" | sed 's|svn-origin/||;s/%20/_/g')" "refs/remotes/$branch"
done

# convert the SVN tag branches to real Git tags
for tag in $(git branch | grep 'tags/'); do
    # remove "tags/" from the tag name
    tag_name="$(echo "$tag" | sed 's|tags/||')"
    git tag -a -m "Tagged: $tag_name" "$tag_name" "$tag"
    # remove the tag branch
    git branch -D $tag
done

# EOF

After you’ve created the local branches, you might remove the SVN remote definition(s) from the ".git/config" file. The remote SVN branches can be removed from ".git/refs/remotes" (the standard "git remote prune <name>" doesn’t seem to work for the SVN remotes).

Multiple projects SVN repository migration

If you are going to migrate a SVN repository, which contains multiple projects, the situation might be a bit more complex, depending on the SVN repository layout.

In the principle, there are two cases, how the original SVN repository might be structured:

Case #1: Multiple projects, each having its own separate trunk/branches/tags

For this case, it depends, if you want to migrate project-by-project (the easier way – each project into a new Git repository) or multiple projects at once (all projects into a single Git repository). You can of course combine both.

Migrate project-by-project

If migrating project-by-project, just repeat everything for each project path (always with a new Git target repository).

You can automate this by using a shell script, for example:

#!/bin/sh

for DIR in Project1 Project2 Project3
do
    mkdir $DIR
    cd $DIR
    git init
    git svn init <the_repository_path_or_url>/$DIR -s --prefix=svn-origin/ --no-metadata
    git svn fetch --fetch-all -A ../authors-transform.txt
    cd ..
done

# EOF

That will create a separate Git repository for each project under the current directory (if the projects have different paths in the repository, you will need to adapt the script appropriately – you can, for example, do some sort of mapping project names to repository paths).

Migrate multiple projects

For conversion of multiple SVN projects into one common Git repository (which is not recommended in general), the ".git/config" file can be updated manually (before launching the fetch command) to handle multiple projects at once (in the fact, each project will be handled as a separate SVN remote location):

[svn-remote "library1"]
    noMetadata = 1
    url = <the_remote_url>
    fetch = libs/library1/trunk:refs/remotes/libs/library1/trunk
    branches = libs/library1/branches/*:refs/remotes/libs/library1/*
    tags = libs/library1/tags/*:refs/remotes/libs/library1/tags/*
[svn-remote "app2"]
    noMetadata = 1
    url = <the_remote_url>
    fetch = apps/app2/trunk:refs/remotes/apps/app2/trunk
    branches = apps/app2/branches/*:refs/remotes/apps/app2/*
    tags = apps/app2/tags/*:refs/remotes/apps/app2/tags/*
...

Note that each SVN remote location must have differently named remote branches.

Case #2: Common trunk/branch/tags for all the projects

This structure can be easily imported at once.

On the contrary, it is more difficult to migrate a single project only. In such case, you’ll need to modify the ".git/config" file manually in the following way:

[svn-remote "svn"]
    noMetadata = 1
    url = <the_remote_url>
    fetch = trunk/<project_path>/<project_name>:refs/remotes/svn-origin/trunk
    branches = branches/*/<project_path>/<project_name>:refs/remotes/svn-origin/*
    tags = tags/*/<project_path>/<project_name>:refs/remotes/svn-origin/tags/*

Again, this can be also modified to import multiple projects at once.

To automate this for a bunch of projects, a script can use the "git config" command, like for example:

#!/bin/sh

for DIR in Project1 Project2 Project3
do
    mkdir $DIR
    cd $DIR
    git init
    # initialize with default structure
    git svn init <the_repository_path_or_url> -s --prefix=svn-origin/ --no-metadata
    # modify the structure appropriately
    git config svn-remote.svn.fetch "trunk/$DIR:refs/remotes/svn-origin/trunk"
    git config svn-remote.svn.branches "branches/*/$DIR:refs/remotes/svn-origin/*"
    git config svn-remote.svn.tags "tags/*/$DIR:refs/remotes/svn-origin/tags/*"
    # do the import
    git svn fetch --fetch-all -A ../authors-transform.txt
    cd ..
done

# EOF

(if the projects have more complicate paths, again some mapping of project names to repository paths needs to be used)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s