Approximate Subtree Identification in Heterogeneous XML Documents Collections

Due to the heterogeneous nature of XML data for internet applications exact matching of queries is often inadequate. The need arises to quickly identify subtrees of XML documents in a collection that are similar to a given pattern. In this paper we discuss different similarity measures between a pattern and subtrees of documents in the collection. An efficient algorithm for the identification of document subtrees, approximately conforming to the pattern, by indexing structures is then introduced.

[1]  Felix Naumann,et al.  Approximate tree embedding for querying XML data , 2000 .

[2]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[3]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[4]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[5]  James Allan,et al.  A survey in indexing and searching XML documents , 2002, J. Assoc. Inf. Sci. Technol..

[6]  H. V. Jagadish,et al.  Evaluating Structural Similarity in XML Documents , 2002, WebDB.

[7]  Sihem Amer-Yahia,et al.  Tree Pattern Relaxation , 2002, EDBT.

[8]  Kaizhong Zhang,et al.  ATreeGrep: approximate searching in unordered trees , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[9]  Matthias Jarke,et al.  Advances in Database Technology — EDBT 2002 , 2002, Lecture Notes in Computer Science.

[10]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[11]  Pekka Kilpeläinen,et al.  Tree Matching Problems with Applications to Structured Text Databases , 2022 .

[12]  Yehoshua Sagiv,et al.  Flexible queries over semistructured data , 2001, PODS '01.