Indexing and querying XML using extended Dewey labeling scheme

Finding all the occurrences of a tree pattern in an XML database is a core operation for efficient evaluation of XML queries. The Dewey labeling scheme is commonly used to label an XML document to facilitate XML query processing by recording information on the path of an element. In order to improve the efficiency of XML tree pattern matching, we introduce a novel labeling scheme, called extended Dewey, which effectively extends the existing Dewey labeling scheme to combine the types and identifiers of elements in a label, and to avoid the scan of labels for internal query nodes to accelerate query processing (in I/O cost). Based on extended Dewey, we propose a series of holistic XML tree pattern matching algorithms. We first present TJFast to answer an XML twig pattern query. To efficiently answer a generalized XML tree pattern, we then propose GTJFast, an optimization that exploits the non-output nodes. In addition, we propose TJFastTL and GTJFastTL based on the tag+level data partition scheme to further reduce I/O costs by level pruning. Finally, we report our comprehensive experimental results to show that our set of XML tree pattern matching algorithms are superior to existing approaches in terms of the number of elements scanned, the size of intermediate results and query performance.

[1]  Mong-Li Lee,et al.  A Prime Number Labeling Scheme for Dynamic Ordered XML Trees , 2004, ICDE.

[2]  Laks V. S. Lakshmanan,et al.  On the evaluation of tree pattern queries , 2006, ICSOFT.

[3]  Marcus Fontoura,et al.  Virtual cursors for XML joins , 2004, CIKM '04.

[4]  Evaggelia Pitoura,et al.  Distributed Structural Relaxation of XPath Queries , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[5]  Yun Chi,et al.  On evolutionary spectral clustering , 2009, TKDD.

[6]  Xiaofeng Meng,et al.  On the sequencing of tree structures for XML indexing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[7]  Tok Wang Ling,et al.  From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching , 2005, VLDB.

[8]  Tok Wang Ling,et al.  TJFast: effective processing of XML twig pattern matching , 2005, WWW '05.

[9]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[10]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[11]  Jeffrey F. Naughton,et al.  On the integration of structure indexes and inverted lists , 2004, Proceedings. 20th International Conference on Data Engineering.

[12]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[13]  Beng Chin Ooi,et al.  Lazy XML updates: laziness as a virtue, of update and structural join efficiency , 2005, SIGMOD '05.

[14]  Jing Li,et al.  A space efficient XML DOM parser , 2007, Data Knowl. Eng..

[15]  Tok Wang Ling,et al.  Efficient processing of XML twig patterns with parent child edges: a look-ahead approach , 2004, CIKM '04.

[16]  Tova Milo,et al.  Algebras for querying text regions (extended abstract) , 1995, PODS.

[17]  Hongjun Lu,et al.  Efficient Processing of Twig Queries with OR-Predicates. , 2004, ACM SIGMOD Conference.

[18]  Tok Wang Ling,et al.  Efficient Processing of Ordered XML Twig Pattern , 2005, DEXA.

[19]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[20]  Ying Zhang,et al.  Efficient Distribution of Full-Fledged XQuery , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[21]  Anthony J. T. Lee,et al.  Mining closed patterns in multi-sequence time-series databases , 2009, Data Knowl. Eng..

[22]  Ziv Bar-Yossef,et al.  The Space Complexity of Processing XML Twig Queries Over Indexed Documents , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[23]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[24]  Tok Wang Ling,et al.  Efficient updates in dynamic XML data: from binary string to quaternary string , 2008, The VLDB Journal.

[25]  Xin Wu,et al.  XML twig pattern matching using version tree , 2008, Data Knowl. Eng..

[26]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[27]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[28]  Hongjun Lu,et al.  Efficient Processing of XML Twig Queries with All Predicates , 2004, 2009 Eighth IEEE/ACIS International Conference on Computer and Information Science.

[29]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[30]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[31]  Divesh Srivastava,et al.  Index structures for matching XML twigs using relational query processors , 2007, Data Knowl. Eng..

[32]  Christian Mathis,et al.  Node labeling schemes for dynamic XML documents reconsidered , 2007, Data Knowl. Eng..

[33]  Weiyi Meng,et al.  Region clustering based evaluation of multiple top-N selection queries , 2008, Data Knowl. Eng..

[34]  Jeffrey Xu Yu,et al.  TwigList : Make Twig Pattern Matching Fast , 2007, DASFAA.

[35]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[36]  LuJiaheng,et al.  Indexing and querying XML using extended Dewey labeling scheme , 2011, ICDKE 2011.

[37]  Susan B. Davidson,et al.  BLAS: an efficient XPath processing system , 2004, SIGMOD '04.

[38]  Gaston H. Gonnet,et al.  A new approach to text searching , 1992, CACM.

[39]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[40]  Bongki Moon,et al.  PRIX: indexing and querying XML using prufer sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[41]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[42]  Masoud Rahgozar,et al.  IFDewey: A New Insert-Friendly Labeling Schemafor XML Data , 2008 .

[43]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[44]  Tok Wang Ling,et al.  On boosting holism in XML twig pattern matching using structural indexing techniques , 2005, SIGMOD '05.

[45]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[46]  Hua-Gang Li,et al.  Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML documents , 2006, VLDB.

[47]  Jan Chomicki,et al.  Computing consistent query answers using conflict hypergraphs , 2004, CIKM '04.

[48]  Chin-Wan Chung,et al.  Dynamic interval-based labeling scheme for efficient XML query and update processing , 2008, J. Syst. Softw..

[49]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[50]  Chung Keung Poon,et al.  Faster Twig Pattern Matching Using Extended Dewey ID , 2006, DEXA.