[Python-projects] xmldiff?
Dan Stromberg
dstromberg at datallegro.com
Wed Sep 26 01:52:55 CEST 2007
Alexandre Fayolle wrote:
> Hi Dan,
>
> I'm Cc'ing the python-projects list at Logilab where people may have
> some better ideas (the main author of xmldiff dwells there).
>
Good idea/thanks.
> On Mon, Sep 24, 2007 at 06:19:40PM -0700, Dan Stromberg wrote:
>
>> Hi Alexandre.
>>
>> I'm looking for a simple tool for making small changes to large XML
>> documents.
>>
>> xmldiff seems a likely candidate.
>>
>
> Depending on how large the xml documents, xmldiff may not be suitable.
> The diff algorithm used is in O(n^2) where n is the number of XML nodes
> in the document as far as I remember, so the cost of diffing increases
> faster than the increase in size of the document.
>
I've tried xmldiff against a "large" (for my employer) xml document, and
performance was acceptable. I'll keep the O(n^2) -ness in mind. BTW,
could it not be reduced to O(nlogn) with some sorting or perhaps even
O(c*n) with some hashing?
>> The XML file I'm looking at right now, is sometimes edited by humans, and
>> has comments in it.
>>
>> It seems like xmldiff can handle this, /except for one small thing/: it
>> seems to specify what to change by number rather than by some sort of tag!
>>
>> So if someone changed the order in which our stanzas appeared in these
>> files, it seems we'd be out of luck.
>>
>
> I'm not sure I'm getting what you mean there. A large difference between
> xmldiff and standard unix diff is the ability of seeing moved nodes
> (instead of removing and adding a given node). However since nodes in
> XML don't have ids other than the index of their position among their
> siblings, this is what xmldiff uses.
>
I may be missing something, but it seems to me that you could specify
"If /x/y/z is changed, then when applying the diff, hunt for a z under
/x/y and change z there", irrespective of what line it appears on.
In our case, we have a lot of java beans, which are named by "id".
Their order doesn't matter much.
>> Does that sound right to you?
>>
>> Are you aware of any related tools that might work better?
>>
>
> No, but I'm not too much in that field.
>
OK.
> An approach which seems commonly used is to somehow normalize the way
> the XML content is presented and to apply traditional unix diff on the
> resulting files.
>
This is interesting.
Do you happen to know of any FLOSS tools for XML normalization? Do any
of them preserve comments and whitespace to some extent? If you have a
favorite, I may try it first - otherwise I'll just "go fish" in google.
Thanks again.
More information about the Python-Projects
mailing list