Cost-Based Query Optimization for Multi Reachability Joins

There is a need to efficiently identify reachabilities between different types of objects over a large data graph. A reachability join (R-join) serves as a primitive operator for such a purpose. Given two types, A and D, R-join finds all pairs of A and D that D-typed objects are reachable from some A-typed objects. In this paper, we focus on processing multi reachability joins (R-joins). In the literature, the up-to-date approach extended the well-known twig-stack join algorithm, to be applicable on directed acyclic graphs (DAGs). The efficiency of such an approach is affected by the density of large DAGs. In this paper, we present algorithms to optimize R-joins using a dynamic programming based on the estimated costs associated with R-join. Our algorithm is not affected by the density of graphs. We conducted extensive performance studies, and report our findings in our performance studies.

[1]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[2]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[3]  Steven J. DeRose,et al.  Xml linking language (xlink), version 1. 0 , 2000, WWW 2000.

[4]  Jianzhong Li,et al.  Subgraph Join: Efficient Processing Subgraph Queries on Graph-Structured XML Document , 2005, WAIM.

[5]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[6]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[7]  Steven J. DeRose,et al.  Xml pointer language (xpointer) version 1 , 2001 .

[8]  Yanchun Zhang,et al.  Web Technologies Research and Development - APWeb 2005, 7th Asia-Pacific Web Conference, Shanghai, China, March 29 - April 1, 2005, Proceedings , 2005, APWeb.

[9]  Jianzhong Li,et al.  Labeling Scheme and Structural Joins for Graph-Structured XML Data , 2005, APWeb.

[10]  Alexander Borgida,et al.  Efficient management of transitive relationships in large data and knowledge bases , 1989, SIGMOD '89.

[11]  Jeffrey Xu Yu,et al.  Fast Reachability Query Processing , 2006, DASFAA.

[12]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[13]  Li Chen,et al.  Stack-based Algorithms for Pattern Matching on DAGs , 2005, VLDB.