A Secure Comparison Technique for Tree Structured Data

Comparing different versions of large tree structured data is a CPU and memory intensive task. State of the art techniques require the complete XML trees and their internal representations to be loaded into memory before any comparison may start. Furthermore, comparing sanitized XML trees is not addressed by these techniques. We propose a comparison technique for sanitized XML documents which ultimately results into a minimum cost edit script transforming the initial tree into the target tree. This method uses encrypted integer labels to encode the original XML structure and content, making the encrypted XML readable only by a legitimate party. Encoded tree nodes can be compared by a third party with a limited intermediate representation.

[1]  Anthony K. H. Tung,et al.  Similarity evaluation on tree-structured data , 2005, SIGMOD '05.

[2]  Sudipto Guha,et al.  Approximate XML joins , 2002, SIGMOD '02.

[3]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.

[4]  Eugene W. Myers,et al.  An O(NP) Sequence Comparison Algorithm , 1990, Inf. Process. Lett..

[5]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[6]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[7]  Tao Jiang,et al.  Alignment of Trees - An Alternative to Tree Edit , 1994, Theor. Comput. Sci..

[8]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[9]  H. V. Jagadish,et al.  Evaluating Structural Similarity in XML Documents , 2002, WebDB.

[10]  Kaizhong Zhang,et al.  Approximate tree pattern matching , 1997 .

[11]  Esko Ukkonen,et al.  Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..

[12]  Mohammad Ashiqur Rahaman An efficient comparison technique for sanitized XML trees , 2009 .

[13]  Andreas Schaad,et al.  Ontology-Based Secure XML Content Distribution , 2009, SEC.

[14]  Stanley M. Selkow,et al.  The Tree-to-Tree Editing Problem , 1977, Inf. Process. Lett..

[15]  Kaizhong Zhang,et al.  Fast parallel algorithms for the unit cost editing distance between trees , 1989, SPAA '89.

[16]  Kyuseok Shim,et al.  Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance , 2007, VLDB.

[17]  Robert A. Wagner,et al.  On the complexity of the Extended String-to-String Correction Problem , 1975, STOC.

[18]  Hector Garcia-Molina,et al.  Meaningful change detection in structured data , 1997, SIGMOD '97.