Evaluating Reachability Queries over Path Collections

Several applications in areas such as biochemistry, GIS, involve storing and querying large volumes of sequential data stored as path collections . There is a number of interesting queries that can be posed on such data. This work focuses on reachability queries: given a path collection and two nodes v s , v t , determine whether a path from v s to v t exists and identify it. To answer these queries, the path-first search paradigm, which treats paths as first-class citizens, is proposed. To improve the performance of our techniques, two indexing structures that capture the reachability information of paths are introduced. Further, methods for updating a path collection and its indices are discussed. Finally, an extensive experimental evaluation verifies the advantages of our approach.

[1]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[2]  Alexander Borgida,et al.  Efficient management of transitive relationships in large data and knowledge bases , 1989, SIGMOD '89.

[3]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[4]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[5]  Philip S. Yu,et al.  Fast computing reachability labelings for large graphs with high compression rate , 2008, EDBT '08.

[6]  Philip S. Yu,et al.  Dual Labeling: Answering Graph Reachability Queries in Constant Time , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[7]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[8]  H.V. Jagadish,et al.  Materialization and incremental update of path information , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[9]  Hongjun Lu,et al.  New Strategies for Computing the Transitive Closure of a Database Relation , 1987, VLDB.

[10]  Yang Xiang,et al.  Efficiently answering reachability queries on very large directed graphs , 2008, SIGMOD Conference.

[11]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[12]  Gerhard Weikum,et al.  Efficient creation and incremental maintenance of the HOPI index for complex XML document collections , 2005, 21st International Conference on Data Engineering (ICDE'05).

[13]  Gerhard Weikum,et al.  HOPI: An Efficient Connection Index for Complex XML Document Collections , 2004, EDBT.

[14]  Philip S. Yu,et al.  Fast Computation of Reachability Labeling for Large Graphs , 2006, EDBT.

[15]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[16]  H. V. Jagadish,et al.  Direct Algorithms for Computing the Transitive Closure of Database Relations , 1987, VLDB.