Distributed K-Distance Indexing Approach for Efficient Shortest Path Discovery on Large Graphs

The emergence of large real life networks such as social networks, web page links, and traffic networks exhibits complex graph structures with millions of vertices and edges. Among many operations for exploiting these graphs, the shortest path discovery is a major and expensive one. Besides the in-memory approaches, many efficient shortest path computation methods have been developed on top of distributed and parallel platforms. Pregel, a bulk synchronous parallel framework, is one of them for processing large graphs. The known shortest path computation approach with Pregel is computation intensive and unable to target real-time services. In this paper, we propose a Pregel based efficient k-distance index technique that allows efficient single pair shortest path discovery. We reduce the network cost and unnecessary operations by transmitting more information in a single superstep. The extensive experiments on both real and synthetic datasets reveal the superiority of the proposed approach.

[1]  Fang Wei TEDI: efficient shortest path query answering on graphs , 2010, SIGMOD 2010.

[2]  Hang Lau,et al.  A Java Library of Graph Algorithms and Optimization (Discrete Mathematics and Its Applications) , 2006 .

[3]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[4]  Jon Kleinberg,et al.  Maximizing the spread of influence through a social network , 2003, KDD '03.

[5]  Jeffrey Xu Yu,et al.  Relational Approach for Shortest Path Discovery over Large Graphs , 2011, Proc. VLDB Endow..

[6]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[7]  Dorothea Wagner,et al.  Speed-Up Techniques for Shortest-Path Computations , 2007, STACS.

[8]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[9]  Sharma Chakravarthy,et al.  HDB-Subdue: A Scalable Approach to Graph Mining , 2009, DaWaK.

[10]  Berthier A. Ribeiro-Neto,et al.  Efficient search ranking in social networks , 2007, CIKM '07.

[11]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[12]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[13]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[14]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[15]  Aristides Gionis,et al.  Fast shortest path distance estimation in large networks , 2009, CIKM.

[16]  Dong Xin,et al.  Fast personalized PageRank on MapReduce , 2011, SIGMOD '11.

[17]  Kurt Mehlhorn,et al.  The LEDA Platform of Combinatorial and Geometric Computing , 1997, ICALP.

[18]  Albert Chan,et al.  CGMGRAPH/CGMLIB: Implementing and Testing CGM Graph Algorithms on PC Clusters and Shared Memory Machines , 2005, Int. J. High Perform. Comput. Appl..

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20]  Haixun Wang,et al.  Efficient subgraph search over large uncertain graphs , 2011, Proc. VLDB Endow..

[21]  Aristides Gionis,et al.  Searching the wikipedia with contextual information , 2008, CIKM '08.

[22]  Sharma Chakravarthy,et al.  DB-FSG: An SQL-Based Approach for Frequent Subgraph Mining , 2008, DEXA.

[23]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.