Orion: Shortest Path Estimation for Large Social Graphs

Through measurements, researchers continue to produce large social graphs that capture relationships, transactions, and social interactions between users. Efficient analysis of these graphs requires algorithms that scale well with graph size. We examine node distance computation, a critical primitive in graph problems such as computing node separation, centrality computation, mutual friend detection, and community detection. For large million-node social graphs, computing even a single shortest path using traditional breadth-first-search can take several seconds. In this paper, we propose a novel node distance estimation mechanism that effectively maps nodes in high dimensional graphs to positions in low-dimension Euclidean coordinate spaces, thus allowing constant time node distance computation. We describe Orion, a pro totype graph coordinate system, and explore critical decisions in its design. Finally, we evaluate the accuracy of Orion's node distance estimates, and show that it can produce accurate results in applications such as node separation, node centrality, and ranked social search.

[1]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[2]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[3]  Hui Zhang,et al.  Predicting Internet network distance with coordinates-based approaches , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[4]  Ben Y. Zhao,et al.  Awarded Best Student Paper! - Pond: The OceanStore Prototype , 2003 .

[5]  Aristides Gionis,et al.  Fast shortest path distance estimation in large networks , 2009, CIKM.

[6]  Robert Tappan Morris,et al.  Vivaldi: a decentralized network coordinate system , 2004, SIGCOMM '04.

[7]  Hui Zhang,et al.  A Network Positioning System for the Internet , 2004, USENIX Annual Technical Conference, General Track.

[8]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[9]  M. Frans Kaashoek,et al.  Vivaldi: a decentralized network coordinate system , 2004, SIGCOMM 2004.

[10]  Miguel Castro,et al.  Scribe: a large-scale and decentralized application-level multicast infrastructure , 2002, IEEE J. Sel. Areas Commun..

[11]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[12]  B. Cohen,et al.  Incentives Build Robustness in Bit-Torrent , 2003 .

[13]  Wei Chen,et al.  Efficient influence maximization in social networks , 2009, KDD.

[14]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[15]  Walter Willinger,et al.  Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications , 2005, Internet Math..

[16]  Jon Crowcroft,et al.  Lighthouses for Scalable Distributed Location , 2003, IPTPS.

[17]  Mark Crovella,et al.  Virtual landmarks for the internet , 2003, IMC '03.

[18]  David D. Jensen,et al.  Using structure indices for efficient approximation of network properties , 2006, KDD '06.

[19]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[20]  Clifford Stein,et al.  Introduction to Algorithms -3/Ed. , 2012 .

[21]  Seungyeop Han,et al.  Analysis of topological characteristics of huge online social networking services , 2007, WWW '07.

[22]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[23]  Cecilia Mascolo,et al.  Temporal distance metrics for social network analysis , 2009, WOSN '09.

[24]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[25]  Matthew C. Elder,et al.  On computer viral infection and the effect of immunization , 2000, Proceedings 16th Annual Computer Security Applications Conference (ACSAC'00).

[26]  Margo I. Seltzer,et al.  Network Coordinates in the Wild , 2007, NSDI.

[27]  Jure Leskovec,et al.  Planetary-scale views on a large instant-messaging network , 2008, WWW.

[28]  Benoit Donnet,et al.  A Survey on Network Coordinates Systems, Design, and Security , 2010, IEEE Communications Surveys & Tutorials.

[29]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[30]  Eng Keong Lua,et al.  Internet Routing Policies and Round-Trip-Times , 2005, PAM.

[31]  Miguel Castro,et al.  PIC: practical Internet coordinates for distance estimation , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[32]  Mark Handley,et al.  Topologically-aware overlay construction and server selection , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[33]  Gayatri Swamynathan,et al.  Do social networks improve e-commerce?: a study on social marketplaces , 2008, WOSN '08.

[34]  Ben Y. Zhao,et al.  Pond: The OceanStore Prototype , 2003, FAST.

[35]  Jonathan M. Smith,et al.  IDES: An Internet Distance Estimation Service for Large Networks , 2006, IEEE Journal on Selected Areas in Communications.

[36]  Zhi-Li Zhang,et al.  On suitability of Euclidean embedding of internet hosts , 2006, SIGMETRICS '06/Performance '06.