Efficient Sampling Methods for Shortest Path Query over Uncertain Graphs

Graph has become a widely used structure to model data. Unfortunately, data are inherently with uncertainty because of the occurrence of noise and incompleteness in data collection. This is why uncertain graphs catch much attention of researchers. However, the uncertain graph models in existing works assume all edges in a graph are independent of each other, which dose not really make sense in real applications. Thus, we propose a new model for uncertain graphs considering the correlation among edges sharing the same vertex. Moreover, in this paper, we mainly solve the shortest path query, which is a funduemental but important query on graphs, using our new model. As the problem of calculating shortest path probability over correlated uncertain graphs is #P-hard, we propose different kinds of sampling methods to efficiently compute an approximate answer. The error is very small in our algorithm, which is proved and further verified in our experiments.

[1]  H. V. Jagadish,et al.  ProTDB: Probabilistic Data in XML , 2002, VLDB.

[2]  Elke A. Rundensteiner,et al.  Hierarchical Encoded Path Views for Path Query Processing: An Optimal Model and Its Performance Evaluation , 1998, IEEE Trans. Knowl. Data Eng..

[3]  Hanan Samet,et al.  Scalable network distance browsing in spatial databases , 2008, SIGMOD Conference.

[4]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[5]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[6]  Athman Bouguettaya,et al.  Web Information System Engineering - WISE 2011 - 12th International Conference, Sydney, Australia, October 13-14, 2011. Proceedings , 2011, WISE.

[7]  Lei Chen,et al.  Discovering Threshold-based Frequent Closed Itemsets over Probabilistic Data , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[8]  Christopher Ré,et al.  Managing Uncertainty in Social Networks , 2007, IEEE Data Eng. Bull..

[9]  Yang Xiang,et al.  A highway-centric labeling approach for answering distance queries on large sparse graphs , 2012, SIGMOD Conference.

[10]  A. Goldberg,et al.  TRANSIT: Ultrafast Shortest-Path Queries with Linear-Time Preprocessing , 2006 .

[11]  James Cheng,et al.  Efficient processing of distance queries in large graphs: a vertex cover approach , 2012, SIGMOD Conference.

[12]  Vassilis J. Tsotras,et al.  Graph Indexing of Road Networks for Shortest Path Queries with Label Restrictions , 2010, Proc. VLDB Endow..

[13]  Jeffrey Xu Yu,et al.  Relational Approach for Shortest Path Discovery over Large Graphs , 2011, Proc. VLDB Endow..

[14]  Shuigeng Zhou,et al.  Shortest Path and Distance Queries on Road Networks: An Experimental Evaluation , 2012, Proc. VLDB Endow..

[15]  Steven K. Thompson,et al.  Sampling: Thompson/Sampling 3E , 2012 .

[16]  Jian Pei,et al.  Probabilistic path queries in road networks: traffic uncertainty aware path selection , 2010, EDBT '10.

[17]  Gerhard Weikum,et al.  Fast and accurate estimation of shortest paths in large graphs , 2010, CIKM.

[18]  Peng Peng,et al.  Top-K Possible Shortest Path Query over a Large Uncertain Graph , 2011, WISE.

[19]  Laurence R. Rilett,et al.  Heuristic shortest path algorithms for transportation applications: State of the art , 2006, Comput. Oper. Res..

[20]  Lei Chen,et al.  Efficiently Answering Probability Threshold-Based Shortest Path Queries over Uncertain Graphs , 2010, DASFAA.

[21]  Philip S. Yu,et al.  Mining Frequent Itemsets over Uncertain Databases , 2012, Proc. VLDB Endow..

[22]  Haixun Wang,et al.  Efficient subgraph search over large uncertain graphs , 2011, Proc. VLDB Endow..

[23]  George S. Fishman A Monte Carlo sampling plan based on product form estimation , 1991, 1991 Winter Simulation Conference Proceedings..

[24]  Fang Wei-Kleiner TEDI: Efficient Shortest Path Query Answering on Graphs , 2011, Graph Data Management.

[25]  Hanan Samet,et al.  Path Oracles for Spatial Networks , 2009, Proc. VLDB Endow..

[26]  Ronald Prescott Loui,et al.  Optimal paths in graphs with stochastic or multidimensional weights , 1983, Commun. ACM.

[27]  Jean-Paul Chilès,et al.  Wiley Series in Probability and Statistics , 2012 .

[28]  Jeffrey Xu Yu,et al.  I/O efficient: computing SCCs in massive graphs , 2013, SIGMOD '13.

[29]  Haixun Wang,et al.  Efficient Keyword Search on Uncertain Graph Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[30]  Xiang Lian,et al.  Efficient query answering in probabilistic RDF graphs , 2011, SIGMOD '11.

[31]  Francis D. Gibbons,et al.  Predicting protein complex membership using probabilistic network reliability. , 2004, Genome research.

[32]  Haixun Wang,et al.  Distance-Constraint Reachability Computation in Uncertain Graphs , 2011, Proc. VLDB Endow..