[XML-Projects] xmldiff - AMD64 fix and general comments

Daniel Hottinger hodaniel at student.ethz.ch
Mon Jun 27 10:12:30 CEST 2005


Hi,

We had a closer look at the xmldiff script on your website[1].

Comments:

- Move detection does not work because you omit step 2(e) of the
  FastMatch algorithm.

- Using a list to store tuples for the matching slows the script
  drastically down. You should (if possible, I'm not a python
  expert) use dictionaries to find the partners. I suggest two,
  one from tree1 -> tree2 and one for the opposite direction.
  We did (as part of a semester thesis) an implementation of FMES
  in Java which runs on two 820k bookmark files in 20 seconds,
  the Python script runs for almost 10 minutes.

- In fmes_node_equal: it would be probably faster to traverse
  one (the smaller) subtree instead of the whole mapping.

- maplookup.c is not 64 bit clean. You shall /NEVER/ cast a
  pointer into an int!! You'll find a patch against version 0.6.7
  for this issue attached to this mail. It's only tested on an
  AMD 64 running in 64 bit mode (Linux 2.6.10, x86_64, Python
  2.3).

Besides: good work!

Hotti

[1] http://www.logilab.org/projects/xmldiff/0.6.7

-- 
Daniel HOTTINGER <hodaniel at student.ethz.ch>
Rosenstrasse 4, CH-8152 Zürich
http://hotti.ch/
TEL: +41 44 810 3908
-------------- next part --------------
diff -Nru xmldiff-0.6.7-orig/extensions/maplookup.c xmldiff-0.6.7/extensions/maplookup.c
--- xmldiff-0.6.7-orig/extensions/maplookup.c	2005-05-03 17:28:45.000000000 +0200
+++ xmldiff-0.6.7/extensions/maplookup.c	2005-06-24 18:08:39.327922316 +0200
@@ -1,5 +1,6 @@
 #include <Python.h>
 #include <stdio.h>
+#include <inttypes.h>
 
 char * __revision__ = "$Id: maplookup.c,v 1.9 2005/04/30 11:59:47 ludal Exp $";
 
@@ -156,11 +157,19 @@
     {
       PyObject *key ;
       couple = PyList_GET_ITEM(_mapping, i) ;
-      key = Py_BuildValue("(i,i)", (int)node1, (int)PyTuple_GET_ITEM(couple, 0)) ;
+#if __WORDSIZE == 64
+      key = Py_BuildValue("(l,l)", (size_t)node1, (size_t)PyTuple_GET_ITEM(couple, 0)) ;
+#else
+      key = Py_BuildValue("(i,i)", (size_t)node1, (size_t)PyTuple_GET_ITEM(couple, 0)) ;
+#endif
       if (PyDict_GetItem(_dict1, key) != NULL)
 	{
 	  Py_DECREF(key) ;
-	  key = Py_BuildValue("(i,i)", (int)node2, (int)PyTuple_GET_ITEM(couple, 1)) ;
+#if __WORDSIZE == 64
+	  key = Py_BuildValue("(l,l)", (size_t)node2, (size_t)PyTuple_GET_ITEM(couple, 1)) ;
+#else
+	  key = Py_BuildValue("(i,i)", (size_t)node2, (size_t)PyTuple_GET_ITEM(couple, 1)) ;
+#endif
 	  if (PyDict_GetItem(_dict2, key) != NULL)
 	    {
 	      seq_num += 1 ;


More information about the XML-Projects mailing list