Coding-based Join Algorithms for Structural Queries on Graph-Structured XML Document

In many applications, XML documents need to be modelled as graphs. The query processing of graph-structured XML documents brings new challenges. In this paper, we design a method based on labelling scheme for structural queries processing on graph-structured XML documents. We give each node some labels, the reachability labelling scheme. By extending an interval-based reachability labelling scheme for DAG by Rakesh et al., we design labelling schemes to support the judgements of reachability relationships for general graphs. Based on the labelling schemes, we design graph structural join algorithms to answer the structural queries with only ancestor-descendant relationship efficiently. For the processing of subgraph query, we design a subgraph join algorithm. With efficient data structure, the subgraph join algorithm can process subgraph queries with various structures efficiently. Experimental results show that our algorithms have good performance and scalability.

[1]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[2]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[3]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[4]  Vassilis J. Tsotras,et al.  Twig query processing over graph-structured XML data , 2004, WebDB '04.

[5]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[6]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[7]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[8]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[9]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[10]  Kam-Fai Wong,et al.  Answering XML Queries Using Path-Based Indexes: A Survey , 2006, World Wide Web.

[11]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[12]  Philip S. Yu,et al.  Fast Computation of Reachability Labeling for Large Graphs , 2006, EDBT.

[13]  Hongjun Lu,et al.  PBiTree coding and efficient processing of containment joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[14]  Donald D. Chamberlin,et al.  XQuery: a query language for XML , 2003, SIGMOD '03.

[15]  Gerhard Weikum,et al.  HOPI: An Efficient Connection Index for Complex XML Document Collections , 2004, EDBT.

[16]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[17]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[18]  Philip S. Yu,et al.  Dual Labeling: Answering Graph Reachability Queries in Constant Time , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  Li Chen,et al.  Stack-based Algorithms for Pattern Matching on DAGs , 2005, VLDB.

[20]  Alessandro Campi,et al.  XQBE: A Graphical Environment to Query XML Data , 2005, World Wide Web.

[21]  Jianzhong Li,et al.  Clustered Chain Path Index for XML Document: Efficiently Processing Branch Queries , 2006, WISE.

[22]  Vassilis Christophides,et al.  On labeling schemes for the semantic web , 2003, WWW '03.

[23]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[24]  Tiko Kameda,et al.  On the Vector Representation of the Reachability in Planar Directed Graphs , 1975, Inf. Process. Lett..

[25]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[26]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[27]  Stefano Crespi-Reghizzi,et al.  A Decidability Theorem for a Class of Vector-Addition Systems , 1975, Inf. Process. Lett..

[28]  Philip S. Yu,et al.  Compact reachability labeling for graph-structured data , 2005, CIKM '05.

[29]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[30]  Alexander Borgida,et al.  Efficient management of transitive relationships in large data and knowledge bases , 1989, SIGMOD '89.