DiffXML: Change Detection in XML Data

In this paper, we introduce a method to map XML files to relational data model by parsing XML files as DOM trees and store value and path information for each node in relational tables. We present an algorithm called “DiffXML” which uses SQL operations to detect changes between two versions of XML file stored in a relational database. The value and path information for XML files are also used to detect differences. DiffXML finds new inserted, deleted and updated nodes, and also finds the move of a subtree from one place to the other in the XML DOM tree. We analyze the performance of DiffXML with some current commercial and research prototype XML change detection tools.