[Python-projects] xmldiff?

Alexandre Fayolle alexandre.fayolle at logilab.fr
Tue Sep 25 09:16:10 CEST 2007


Hi Dan,

I'm Cc'ing the python-projects list at Logilab where people may have
some better ideas (the main author of xmldiff dwells there).

On Mon, Sep 24, 2007 at 06:19:40PM -0700, Dan Stromberg wrote:
>
> Hi Alexandre.
>
> I'm looking for a simple tool for making small changes to large XML 
> documents.
>
> xmldiff seems a likely candidate.

Depending on how large the xml documents, xmldiff may not be suitable.
The diff algorithm used is in O(n^2) where n is the number of XML nodes
in the document as far as I remember, so the cost of diffing increases
faster than the increase in size of the document. 

> The XML file I'm looking at right now, is sometimes edited by humans, and 
> has comments in it.
>
> It seems like xmldiff can handle this, /except for one small thing/: it 
> seems to specify what to change by number rather than by some sort of tag!
>
> So if someone changed the order in which our stanzas appeared in these 
> files, it seems we'd be out of luck.

I'm not sure I'm getting what you mean there. A large difference between
xmldiff and standard unix diff is the ability of seeing moved nodes
(instead of removing and adding a given node). However since nodes in
XML don't have ids other than the index of their position among their
siblings, this is what xmldiff uses. 

> Does that sound right to you?
>
> Are you aware of any related tools that might work better?

No, but I'm not too much in that field. 

An approach which seems commonly used is to somehow normalize the way
the XML content is presented and to apply traditional unix diff on the
resulting files. 

> How hard would it be to make xmldiff work the way I'm hoping for?

No idea. 

> Thanks!

You are welcome. 

> -- 
> Dan Stromberg
> DATAllegro Systems Engineering
> 1 (877) 470-DATA (3282)
> dstromberg at datallegro.com <mailto:dstromberg at datallegro.com>
> www.datallegro.com <http://www.datallegro.com/>

Regards,

-- 
Alexandre Fayolle                              LOGILAB, Paris (France)
Formations Python, Zope, Plone, Debian:  http://www.logilab.fr/formations
Développement logiciel sur mesure:       http://www.logilab.fr/services
Informatique scientifique:               http://www.logilab.fr/science
Reprise et maintenance de sites CPS:     http://www.migration-cms.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 481 bytes
Desc: Digital signature
Url : http://lists.logilab.org/pipermail/python-projects/attachments/20070925/c11a792b/attachment.pgp 


More information about the Python-Projects mailing list