[Python-projects] xmldiff?

Dan Stromberg dstromberg at datallegro.com
Wed Sep 26 01:52:55 CEST 2007


Alexandre Fayolle wrote:
> Hi Dan,
>
> I'm Cc'ing the python-projects list at Logilab where people may have
> some better ideas (the main author of xmldiff dwells there).
>   
Good idea/thanks.
> On Mon, Sep 24, 2007 at 06:19:40PM -0700, Dan Stromberg wrote:
>   
>> Hi Alexandre.
>>
>> I'm looking for a simple tool for making small changes to large XML 
>> documents.
>>
>> xmldiff seems a likely candidate.
>>     
>
> Depending on how large the xml documents, xmldiff may not be suitable.
> The diff algorithm used is in O(n^2) where n is the number of XML nodes
> in the document as far as I remember, so the cost of diffing increases
> faster than the increase in size of the document. 
>   
I've tried xmldiff against a "large" (for my employer) xml document, and 
performance was acceptable.  I'll keep the O(n^2) -ness in mind.  BTW, 
could it not be reduced to O(nlogn) with some sorting or perhaps even 
O(c*n) with some hashing?
>> The XML file I'm looking at right now, is sometimes edited by humans, and 
>> has comments in it.
>>
>> It seems like xmldiff can handle this, /except for one small thing/: it 
>> seems to specify what to change by number rather than by some sort of tag!
>>
>> So if someone changed the order in which our stanzas appeared in these 
>> files, it seems we'd be out of luck.
>>     
>
> I'm not sure I'm getting what you mean there. A large difference between
> xmldiff and standard unix diff is the ability of seeing moved nodes
> (instead of removing and adding a given node). However since nodes in
> XML don't have ids other than the index of their position among their
> siblings, this is what xmldiff uses. 
>   
I may be missing something, but it seems to me that you could specify 
"If /x/y/z is changed, then when applying the diff, hunt for a z under 
/x/y and change z there", irrespective of what line it appears on.

In our case, we have a lot of java beans, which are named by "id".  
Their order doesn't matter much.
>> Does that sound right to you?
>>
>> Are you aware of any related tools that might work better?
>>     
>
> No, but I'm not too much in that field. 
>   
OK.
> An approach which seems commonly used is to somehow normalize the way
> the XML content is presented and to apply traditional unix diff on the
> resulting files. 
>   
This is interesting.

Do you happen to know of any FLOSS tools for XML normalization?  Do any 
of them preserve comments and whitespace to some extent?  If you have a 
favorite, I may try it first - otherwise I'll just "go fish" in google.

Thanks again.



More information about the Python-Projects mailing list