May 18, 2009

feedback about converting eigen2 to mercurial

by orzel
Categories: Admin, KDE
Tags: , ,
Comments: 6 Comments

This week-end we did the final conversion of the eigen2 source code repository. I shall describe here the few problems we had, as a feedback to the community.

Eigen original purpose was to help provide linear algebra for several KDE parts. As such, it was until now developed inside the KDE repository, which (still) uses subversion. Let it be clear that KDE did a wonderful job at hosting eigen repository, even though several accounts needed to be created for people who only contribute to eigen and not KDE as a whole. Eigen is getting big enough to live and release its own way, but eigen keeps very strong relationships with KDE.
Here is the script that was used for the conversion. The version of mercurial (and of the mercurial convert extension) was 1.2.1.

hg convert https://orzel@svn.kde.org/home/kde \
    --authors eigen.authors \
    --config convert.svn.tags=tags/eigen \
    --config convert.svn.branches=branches/eigen \
    --config convert.svn.trunk=trunk/kdesupport/eigen2\
    eigen.hg

We faced several problems. Fortunately all but one could be resolved patching python code. That would have been a lot more difficult using, for example, git. Speaking of git, we have found no easy way to make such a conversion, mainly because KDE repository is so huge and it seems git needs to parse the whole history of the whole repository just to convert this part. Eigen is quite small. It took mercurial two hours to convert it, almost all of which was due to network latency between my computer and KDE repository.

Bad default branch name

In mercurial, the default branch name is ‘default’ (same as ‘trunk’ on svn). If you browse a mercurial repository (using the wonderful tck/tk frontend or the web browser), each changeset is tagged with the branch name (and with every tag name is there is any). There is an exception to this, if the branch is ‘default’, nothing is displayed, as a convenience for clarity. Which is great. Using the script I present here, the convert extension would create a branch according to the name of the branches/ subdirectory, which is fine. Though the name for the main branch, the one called ‘trunk’ in svn and usually called ‘default’ in mercurial, was named ‘eigen2’ here, probably because the main directory was named this way.

I have found no way to prevent this using command line arguments. I have tried to play with the –filemap option, but failed. The final solution I used was to locally patch my mercurial (yes, I know, this is ugly):

--- /usr/lib/python2.6/site-packages/hgext/convert/subversion.py
+++ hgext/convert/subversion.py
@@ -777,6 +777,8 @@
                 branch = self.module.split("/")[-1]
                 if branch == 'trunk':
                     branch = ''
+                if branch == 'eigen2':
+                    branch = ''
             except IndexError:
                 branch = None

Bad branch used to save the tags file

In mercurial, tags are created by updating a file called ‘.hgtags’ which maps changesets (identified by their md5 sum) to a string. At the end of the conversion process, ‘hg convert’ will create such a file and check it in the mercurial repository. Unfortunately in our case, the file was not created on the right branch neither, so I also had to patch mercurial for this :

--- /usr/lib/python2.6/site-packages/hgext/convert/hg.py
+++ hgext/convert/hg.py
@@ -178,7 +178,8 @@                                                                                                   

          self.ui.status(_("updating tags\n"))                                                                       
          date = "%s 0" % int(time.mktime(time.gmtime()))                                                            
-         extra = {'branch': self.tagsbranch}                                                                        
+#         extra = {'branch': self.tagsbranch}                                                                       
+         extra = {'branch': 'default'}                                                                              
          ctx = context.memctx(self.repo, (tagparent, None), "update tags",                                          
                               [".hgtags"], getfilectx, "convert-repo", date,                                        
                               extra)

Spurious file from old history

Once the conversion was done, we noticed a file called src/Core, which should not be here. A diff with a svn checkout would confirm that this is the only difference. I have traced back the problem to a bug in svn which would allow a ‘svn copy’ to be done even if the destination name would already exist.

In our case, a directory called ‘Core’ was copied on top of the file ‘Core’. As such, the underlying file was never deleted, and mercurial kept it. Mercurial did quite well dealing with this actually.

See this wonderful commit

http://websvn.kde.org/?view=rev&revision=72256

And this is how the repository looked like just before this commit:

http://websvn.kde.org/branches/work/eigen2/src/?pathrev=72254

We could not manage to fix this problem, and we just removed the file after the conversion. Having this file would anyway not prevent using eigen2 in compilation.

It seems the bug is fixed in svn 1.6.2:  I can not reproduce this on a small test svn repository. A bug was filled for mercurial with all details, but as we cannot reproduce this behaviour on a test svn repository, this will be very difficult to solve. We for sure can not use KDE repository to test, it is far too huge for this.

Conclusion

We now have eigen2 repository on https://bitbucket.org/eigen/eigen/ and so far everything seems ok. The repository is worth 7.3 Mb, which is perfectly honest. We can now grant or revoke write access very quickly and easily without disturbing KDE admins. And of course we also have all the wonderful advantages of DVCS.


6 Comments »

  1. Ian Monroe says:

    Scripts do exist to convert KDE apps to git. Not saying that you should’ve used git or whatever, just don’t want people to get the impression that its not possible.

  2. Benoit Jacob says:

    Thanks Thomas for all your work!

    Ian: great, just for reference, where were they? Do they handle branches and tags? (That’s the main issue)

    I didn’t do anything special with git-svn, just created a .git/config, had it reviewed on #git, and ran git-svn init with it; after 6 hours and tons of network traffic, i still had no object in my local directory so I was rather desperate. If I only wanted trunk, on the other hand, it went well.

  3. mat69 says:

    Benoit: afaik you have to “git svn fetch -rAREVNUMBER” (if you choose the last revision you won’t have the old history) afterwards, otherwise nothing should be downloaded, but in your case “git svn clonse svn:ssh/…..” would probably be better — you’d have the complete history that way. Still the problem of the authors remains –> see “author file git” (or something like that) on google.

    Depending on what you check out it will take a while. Though you should not clone the whole trunk, that would take really long with all the history.

    In general I like every move away from svn, CVS are just a pain imo. 🙂

    PS.: All that info is right out of my head, so might be wrong. 😉

  4. hron84 says:

    IIRC, you can specify the revision range to git-svn what you want to convert.

  5. Benoit Jacob says:

    Mat69, Hron84: of course I tried that. Didn’t help. Even the exact revisions range corresponding to the lifespan of Eigen in KDE’s SVN, already spans over 300,000 revisions. Yep, KDE’s SVN repo is _that_ big.

    Anyway, Thiago just confirmed on the current kde-devel thread that this was indeed impossible to do without local access to the KDE SVN server (or a dump of its data — worth 35 GB)

  6. Tom says:

    I’ve always been highly impressed with the speed and goodness that is hg convert.

    glad it solved your problem, eigen is a great project, keep up the fantastic work!

Leave a Reply

Your email address will not be published. Required fields are marked *