The shortest path is not always a straight line

In this paper, we leverage the concept of the metric backbone to improve the efficiency of large-scale graph analytics. The metric backbone is the minimum subgraph that preserves the shortest paths of a weighted graph. We use the metric backbone in place of the original graph to compute various graph metrics exactly or with good approximation. By computing on a smaller graph, we improve the performance of graph analytics applications on two different systems, a batch graph processing system and a graph database. Further, we provide an algorithm for the computation of the metric backbone on large graphs. While one can compute the metric backbone by solving the all-pairs-shortest-paths problem, this approach incurs prohibitive time and space complexity for big graphs. Instead, we propose a heuristic that makes computing the metric backbone practical even for large graphs. Additionally, we analyze several real datasets of different sizes and domains and we show that we can approximate the metric backbone by removing only first-order semi-metric edges; edges for which a shorter two-hop path exists. We provide a distributed implementation of our algorithm and apply it in large scale scenarios. We evaluate our algorithm using a variety of real graphs, including a Facebook social network subgraph of ~50 billion edges. We measure the impact of using the metric backbone on runtime performance in two graph management systems. We achieve query speedups of up to 6.7x in the Neo4j commercial graph database and job speedups of up to 6x in the Giraph graph processing system.

[1]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[2]  Sebastiano Vigna,et al.  In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[3]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[4]  Alessandro Vespignani,et al.  Reaction–diffusion processes and metapopulation models in heterogeneous networks , 2007, cond-mat/0703129.

[5]  Bin Ma,et al.  On the similarity metric and the distance metric , 2009, Theor. Comput. Sci..

[6]  Luis Mateus Rocha Proximity and semi-metric analysis of social networks , 2002 .

[7]  Luis Mateus Rocha,et al.  Distance closures on complex networks , 2013, Network Science.

[8]  David A. Bader,et al.  Investigating Graph Algorithms in the BSP Model on the Cray XMT , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[9]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[10]  Adriana Iamnitchi,et al.  The Influence of Indirect Ties on Social Network Dynamics , 2014, SocInfo.

[11]  Qi He,et al.  Distributed Graph Summarization , 2014, CIKM.

[12]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[13]  Noah E. Friedkin,et al.  Horizons of Observability and Limits of Informal Control in Organizations , 1983 .

[14]  David Hardcastle,et al.  Using Pregel-like Large Scale Graph Processing Frameworks for Social Network Analysis , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[15]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.

[16]  Cynthia A. Phillips,et al.  Why do simple algorithms for triangle enumeration work in the real world? , 2014, Internet Math..

[17]  Alon Itai,et al.  Finding a minimum circuit in a graph , 1977, STOC '77.

[18]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[19]  L. M. Rocha,et al.  Semi-metric Behavior in Document Networks and its Application to Recommendation Systems , 2003, ArXiv.

[20]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[21]  Luis Mateus Rocha,et al.  MyLibrary at LANL: proximity and semi-metric networks for a collaborative and recommender Web service , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[22]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[23]  Debmalya Panigrahi,et al.  A general framework for graph sparsification , 2010, STOC '11.

[24]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[25]  Adriana Iamnitchi,et al.  The power of indirect ties in friend-to-friend storage systems , 2014, 14-th IEEE International Conference on Peer-to-Peer Computing.

[26]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[27]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[28]  Luis Mateus Rocha,et al.  Semi-metric Networks for Recommender Systems , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[29]  Jose Augusto Ramos Soares,et al.  Graph Spanners: a Survey , 1992 .

[30]  James Cheng,et al.  Triangle listing in massive networks and its applications , 2011, KDD.

[31]  Petr A. Golovach,et al.  Spanners in sparse graphs , 2008, J. Comput. Syst. Sci..

[32]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[33]  Alfred V. Aho,et al.  The Transitive Reduction of a Directed Graph , 1972, SIAM J. Comput..

[34]  Rizal Setya Perdana What is Twitter , 2013 .

[35]  Santo Fortunato,et al.  Random Walks on Directed Networks: the Case of PageRank , 2007, Int. J. Bifurc. Chaos.

[36]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[37]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[38]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[39]  A. Vespignani,et al.  The architecture of complex weighted networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Gerald L. Thompson,et al.  An Algorithm for Finding a Minimum Equivalent Graph of a Digraph , 1969, J. ACM.

[41]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[42]  Pietro Liò,et al.  Towards real-time community detection in large networks. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  Dorothea Wagner,et al.  Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study , 2005, WEA.

[44]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[45]  Xin Wang,et al.  Query preserving graph compression , 2012, SIGMOD Conference.

[46]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[47]  Jimeng Sun,et al.  Centralities in Large Networks: Algorithms and Observations , 2011, SDM.

[48]  I. N. A. C. I. J. H. Fowler Book Review: Connected: The surprising power of our social networks and how they shape our lives. , 2009 .

[49]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[50]  Tore Opsahl Triadic closure in two-mode networks: Redefining the global and local clustering coefficients , 2013, Soc. Networks.