BMC: An Efficient Method to Evaluate Probabilistic Reachability Queries

Reachability query is a fundamental problem in graph databases. It answers whether or not there exists a path between a source vertex and a destination vertex and is widely used in various applications including road networks, social networks, world wide web and bioinformatics. In some emerging important applications, uncertainties may be inherent in the graphs. For instance, each edge in a graph could be associated with a probability to appear. In this paper, we study the reachability problem over such uncertain graphs in a threshold fashion, namely, to determine if a source vertex could reach a destination vertex with probabilty larger than a user specified probability value t. Finding reachability on uncertain graphs has been proved to be NP-Hard. We first propose novel and effective bounding techniques to obtain the upper bound of reachability probability between the source and destination. If the upper bound fails to prune the query, efficient dynamic Monte Carlo simulation technqiues will be applied to answer the probabilitistic reachability query with an accuracy guarantee. Extensive experiments over real and synthetic datasets are conducted to demonstrate the efficiency and effectiveness of our techniques.

[1]  Jianzhong Li,et al.  Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics , 2010, KDD.

[2]  George S. Fishman A Comparison of Four Monte Carlo Methods for Estimating the Probability of s-t Connectedness , 1986, IEEE Transactions on Reliability.

[3]  Christopher Ré,et al.  Managing Uncertainty in Social Networks , 2007, IEEE Data Eng. Bull..

[4]  Jianzhong Li,et al.  Finding top-k maximal cliques in an uncertain graph , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[5]  Bin Jiang,et al.  Mining preferences from superior and inferior examples , 2008, KDD.

[6]  Klaus Simon,et al.  An Improved Algorithm for Transitive Closure on Acyclic Digraphs , 1986, Theor. Comput. Sci..

[7]  Gerhard Weikum,et al.  HOPI: An Efficient Connection Index for Complex XML Document Collections , 2004, EDBT.

[8]  Jan Chomicki,et al.  Hippo: A System for Computing Consistent Answers to a Class of SQL Queries , 2004, EDBT.

[9]  Malcolm C. Easton,et al.  Sequential Destruction Method for Monte Carlo Evaluation of System Reliability , 1980, IEEE Transactions on Reliability.

[10]  Richard M. Karp,et al.  A new Monte-Carlo Method for Estimating the Failure Probability of an N-Component System , 1983 .

[11]  Torsten Grust,et al.  Advances in database technology - EDBT 2006 : 10th International Conference on Extending Database Technology, Munich, Germany, March 2006; proceedings , 2006 .

[12]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[13]  J. Scott Provan,et al.  Computing Network Reliability in Time Polynomial in the Number of Cuts , 1984, Oper. Res..

[14]  D. R. Shier,et al.  Bounding the Reliability of Networks , 1992 .

[15]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[16]  Philip S. Yu,et al.  Fast Computation of Reachability Labeling for Large Graphs , 2006, EDBT.

[17]  Alexander Borgida,et al.  Efficient management of transitive relationships in large data and knowledge bases , 1989, SIGMOD '89.

[18]  Philip S. Yu,et al.  Dual Labeling: Answering Graph Reachability Queries in Constant Time , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  Edward P. F. Chan,et al.  Optimization and evaluation of shortest path queries , 2007, The VLDB Journal.

[20]  Ronaldus W. Meester A Natural Introduction to Probability Theory , 2004 .

[21]  Yang Xiang,et al.  Efficiently answering reachability queries on very large directed graphs , 2008, SIGMOD Conference.

[22]  George S. Fishman,et al.  A Monte Carlo Sampling Plan for Estimating Network Reliability , 1984, Oper. Res..

[23]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[24]  Li Chen,et al.  Stack-based Algorithms for Pattern Matching on DAGs , 2005, VLDB.

[25]  Jianzhong Li,et al.  Mining Frequent Subgraph Patterns from Uncertain Graph Data , 2010, IEEE Transactions on Knowledge and Data Engineering.

[26]  H. V. Jagadish,et al.  A compression technique to materialize transitive closure , 1990, TODS.

[27]  M. Okamoto Some inequalities relating to the partial sum of binomial probabilities , 1959 .