Local Git-/SVN-projects to Github

Introduction

This year I decided to move 2 of my local Git-/SVN-projects to my Github account. The SVN-Project (mccm-project) I started in early 2012 for MCCM Feldkirch. That time I basically converted the bespoke MCCM homepage to wordpress. Later – in 2015 – I started my own blog codingcookie.com also powered by wordpress. In 2015 I already used Git a lot and therefore I started a local Git-repo to maintain the codingcookie.com code base (coding-cookie-project). This article describes 3 thinks I dealt with while I moved both projects to Github:

  1. Convert the SVN-project into a Git-repo and maintain the commit history.
  2. Remove data which should not be published: While I maintained the two projects locally I had a folder called “dumps” in the project root of both projects. On a regular basis I made snapshots of the prod-databases, dropped them into this folders and commited them alongside with other data to the projects. This dump files and the wp-config.php parameters I removed from the projects before I puhlished them. In addition I removed few more files from Git history while worked on it. Find more details in the corresponding chapter.
  3. Publish to my Github account

The upcoming chapters desribe the whole procedure I went through targeting the mccm-project. For the coding-cookie-project the procedure was the same beside that there was no need to convert it to a Git-repo first.

Convert the SVN-project to Git-repo

First step is to prepare a map file that contains the SVN usernames and their Git counterparts. An SVN revision only stores the username alongside the commit message whereas a Git revision stores the full name and email address alongside with the commit message. The map file helps to bridge this gap while convertion.

My authors.txt only contains my user info as I was the only author:
edi = Eden33 <eduard.gopp@gmail.com>

The map file together with git svn clone can be used to clone the SVN-project into a Git-repo. If your project is based on the standard SVN-layout (3 folders: branches, tags and trunk) you can use –stdlayout parameter. If it has a different folder structure you must provide –trunk, –branches and –tags as part of the clone command.

The clone command skeleton:

git svn clone --stdlayout --authors-file=authors.txt [svn-repo]/[project] [git-repo-name]

My mccm-project had a standard layout. However, the first clone command failed like this:

git svn clone fails

After some research I enabled svn:// protocol with help of svnserve.exe available in TortoiseSVN:

using svnserve.exe to enable svn-protocol

And changed the protocol in the git svn clone command to svn protocol and it worked:

git svn clone

Checking the converted project

Beside the fact that a converted project must be reviewed (comparison of commit messages; branch/tag comparsion) here are few things I came across while convertion.

Empty directories
git svn clone by default doesn’t  add empty directories to the cloned project. If you want to add such directories you must specify –preserve-empty-dirs within clone command.

New lines
git svn clone doesn’t care about line endings and transfers them like they are maintained in SVN (tested for Git version “2.16.1”): See also this post on Stackoveflow. However, later in Git it is quite common practice to control it right from the very beginning how Git encodes and takes care about line endings maintained in the repo: For further information click here for a good article.

While I maintained the mccm-project in SVN I was not picky and didn’t care about it. Some of the files in the project were enconded with CRLF (as I created them on my windows computer) and some of them were encoded differently as their origin was not windows. Finally I decided to setup my Git project more clean and normalized all new lines of text files with LF like this:

echo "* text=auto" >.gitattributes
git add -u --renormalize .
git commit -m "Introduce new line normalization"

 

Remove data from revision history

As already explained in the “Introduction” I removed more than just the dump files and the wp-config.php parameters. The mccm-project initially had approx. 5 GB with all images and galleries included. Github had a disk quota of 1 GB while writing. Therefore I decided to also remove the images folders. The list of data removed from revision history:

  • <project root>/.buildpath
  • <project root>/.project
  • <project root>/wp-config.php (cleaned up parameters)
  • <project root>/dumps
  • <project root>/wp-content/gallery
  • <project root>/wp-content/uploads

To get this job done I jused BFG Repo Cleaner, a simpler and faster alternative to git filter-branch command. BFG requires that your last commit in HEAD contains all the changes you want to apply with the tool afterwards. This means I deleted all the folders and files mentioned. In addition removed all data from wp-config.php which was sensitive. This I commited to the HEAD (0ff5c1e). After that I executed the following on command line:

BFG Repo Cleaner in action

Finally I executed the git reflog/gc command like (requested in last screenshot) to delete the cleaned up data from my local working copy.

Publish to my Github account

It’s straight forward. First I created a project in Github and copy the remote URL provided in Github web interface. Next the screenshot I made which describes the whole process.

Change origin to Github repo and push the history

Conclusion

Converting a project from SVN to Git is not hard. As explained in this article there are some things you should be aware of while you deal with git svn clone. From my point of view the new line normalization is a hot topic if you haven’t taken care about it previously in SVN. Furthermore you should be careful if you work with BFG Repo Cleaner. I can confirm it’s super fast. However, your input must be correct to end up in correct history rewrite. So always check the log output on console while executing BFG Repo Cleaner and tripple check the end result (=Git history).