Reducing Graph Matching to Tree Matching for XML Queries with ID References

ID/IDREF is an important and widely used feature in XML documents for eliminating data redundancy. Most existing algorithms consider an XML document with ID references as a graph and perform graph matching for queries involving ID references. Graph matching naturally brings higher complexity compared with original tree matching algorithms that process XML queries. In this paper, wemake use of semantics of ID/IDREF to reduce graph matching to tree matching to process queries involving ID references. Using our approach, an XML document with ID/IDREF is not treated as a graph, and a general query with ID references will be decomposed and processed using tree pattern matching techniques, which are more efficient than graph matching. Furthermore, our approach is able to handle complex ID references, such as cyclic references and sequential references, which cannot be handled efficiently by existing approaches. The experimental results show that our approach is 20-50% faster than MonetDB, an XQuery engine, and at least 100 times faster than TwigStackD, an existing graph matching algorithm.

[1]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[2]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[3]  Yehoshua Sagiv,et al.  Twig Patterns: From XML Trees to Graphs , 2006, WebDB.

[4]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[5]  Toshiyuki Amagasa,et al.  XRel: a path-based approach to storage and retrieval of XML documents using relational databases , 2001, ACM Trans. Internet Techn..

[6]  Li Chen,et al.  Stack-based Algorithms for Pattern Matching on DAGs , 2005, VLDB.

[7]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[8]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[9]  Jeffrey F. Naughton,et al.  Efficient XML-to-SQL Query Translation: Where to Add the Intelligence? , 2004, VLDB.

[10]  Jeffrey F. Naughton,et al.  Recursive XML schemas, recursive XML queries, and relational storage: XML-to-SQL query translation , 2004, Proceedings. 20th International Conference on Data Engineering.

[11]  Ioana Manolescu,et al.  The XML benchmark project , 2001 .

[12]  Hongjun Lu,et al.  Query translation from XPath to SQL in the presence of recursive DTDs , 2009, The VLDB Journal.

[13]  Hongjun Lu,et al.  Efficient Processing of XML Twig Queries with All Predicates , 2004, 2009 Eighth IEEE/ACIS International Conference on Computer and Information Science.

[14]  Vassilis J. Tsotras,et al.  Twig query processing over graph-structured XML data , 2004, WebDB '04.

[15]  Jianzhong Li,et al.  Hash-base subgraph query processing method for graph-structured XML documents , 2008, Proc. VLDB Endow..

[16]  Tok Wang Ling,et al.  VERT: A Semantic Approach for Content Search and Content Extraction in XML Query Processing , 2007, ER.

[17]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[18]  Tok Wang Ling,et al.  From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching , 2005, VLDB.

[19]  Tok Wang Ling,et al.  On boosting holism in XML twig pattern matching using structural indexing techniques , 2005, SIGMOD '05.

[20]  Bernhard Thalheim,et al.  Conceptual Modeling - ER 2007 , 2007, Lecture Notes in Computer Science.

[21]  Rada Chirkova,et al.  Efficiently Querying Large XML Data Repositories: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[22]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[23]  Hua-Gang Li,et al.  Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML documents , 2006, VLDB.

[24]  Hongjun Lu,et al.  Efficient Processing of Twig Queries with OR-Predicates. , 2004, ACM SIGMOD Conference.

[25]  Mingfei Jiang Querying XML data: efficiency and security issues , 2006 .

[26]  Torsten Grust,et al.  Accelerating XPath evaluation in any RDBMS , 2004, TODS.