Parallel Maximum Clique Algorithms with Applications to Network Analysis

We present a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from a thousand to a hundred million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. At its heart the algorithm employs a branch-and-bound strategy with novel and aggressive pruning techniques. The pruning techniques include the combined use of core numbers of vertices along with a good initial heuristic solution to remove the vast majority of the search space. In addition, the exploration of the search tree is parallelized. During the search, processes immediately communicate changes to upper and lower bounds on the size of the maximum clique. This exchange of information occasionally results in a superlinear speedup because tasks with large search spaces can be pruned by other processes. We de...

[1]  Robert E. Tarjan,et al.  Finding a Maximum Independent Set , 1976, SIAM J. Comput..

[2]  Gregory Buehrer,et al.  A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[3]  Ramana Rao Kompella,et al.  Time-based sampling of social network activity graphs , 2010, MLG '10.

[4]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[5]  Jorge J. Moré,et al.  Benchmarking optimization software with performance profiles , 2001, Math. Program..

[6]  J. Jeffry Howbert,et al.  The Maximum Clique Problem , 2007 .

[7]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[8]  LeskovecJure,et al.  Defining and evaluating network communities based on ground-truth , 2015 .

[9]  H. Finck,et al.  Über eine von H. S. WILF angegebene Schranke für die chromatische Zahl endlicher Graphen , 1969 .

[10]  Ashraf Aboulnaga,et al.  Scalable maximum clique computation using MapReduce , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[11]  Pablo San Segundo,et al.  An exact bit-parallel algorithm for the maximum clique problem , 2011, Comput. Oper. Res..

[12]  Jari Saramäki,et al.  Path lengths, correlations, and centrality in temporal networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Kun-Lung Wu,et al.  Streaming Algorithms for k-core Decomposition , 2013, Proc. VLDB Endow..

[14]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[15]  Bonnie Berger,et al.  Global alignment of multiple protein interaction networks with application to functional orthology detection , 2008, Proceedings of the National Academy of Sciences.

[16]  Ciaran McCreesh,et al.  Multi-Threading a State-of-the-Art Maximum Clique Algorithm , 2013, Algorithms.

[17]  Rizal Setya Perdana What is Twitter , 2013 .

[18]  David S. Johnson,et al.  Cliques, Coloring, and Satisfiability , 1996 .

[19]  Janez Konc,et al.  An improved branch and bound algorithm for the maximum clique problem , 2007 .

[20]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[21]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[22]  Tore Opsahl,et al.  Clustering in weighted networks , 2009, Soc. Networks.

[23]  Mason A. Porter,et al.  Social Structure of Facebook Networks , 2011, ArXiv.

[24]  Gergely Palla,et al.  Fundamental statistical features and self-similar properties of tagged networks , 2008, 0812.4236.

[25]  Christos Faloutsos,et al.  Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining , 2011, 2011 IEEE 11th International Conference on Data Mining.

[26]  Wei-keng Liao,et al.  Fast Algorithms for the Maximum Clique Problem on Massive Sparse Graphs , 2012, WAW.

[27]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[28]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[29]  Karen Rose,et al.  What is Twitter , 2009 .

[30]  Alex Pentland,et al.  Reality mining: sensing complex social systems , 2006, Personal and Ubiquitous Computing.

[31]  Panos M. Pardalos,et al.  The maximum clique problem , 1994, J. Glob. Optim..

[32]  Ryan A. Rossi,et al.  Fast maximum clique algorithms for large graphs , 2014, WWW.

[33]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[34]  James Cheng,et al.  Fast algorithms for maximal clique enumeration with limited memory , 2012, KDD.

[35]  David Eppstein,et al.  Listing All Maximal Cliques in Sparse Graphs in Near-optimal Time , 2010, Exact Complexity of NP-hard Problems.

[36]  Patric R. J. Östergård,et al.  A fast algorithm for the maximum clique problem , 2002, Discret. Appl. Math..

[37]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[38]  David Eppstein,et al.  Listing All Maximal Cliques in Large Sparse Real-World Graphs , 2011, JEAL.

[39]  Afonso Ferreira,et al.  On models and algorithms for dynamic communication networks : the case for evolving graphs † , 2007 .

[40]  Subhash Khot,et al.  Improved inapproximability results for MaxClique, chromatic number and approximate graph coloring , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[41]  Christos Faloutsos,et al.  Large human communication networks: patterns and a utility-driven generator , 2009, KDD.

[42]  Jaikumar Radhakrishnan,et al.  Greed is good: Approximating independent sets in sparse and bounded-degree graphs , 1997, Algorithmica.

[43]  David F. Gleich,et al.  Vertex neighborhoods, low conductance cuts, and good seeds for local community methods , 2012, KDD.

[44]  Jeffrey Xu Yu,et al.  Finding maximal cliques in massive networks , 2011, TODS.

[45]  M. Trick,et al.  Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge, Workshop, October 11-13, 1993 , 1996 .

[46]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[47]  David F. Gleich Graph of Flickr Photo-Sharing Social Network Crawled in May 2006 , 2012 .

[48]  Nagiza F. Samatova,et al.  A scalable, parallel algorithm for maximal clique enumeration , 2009, J. Parallel Distributed Comput..

[49]  Patrick Prosser,et al.  Exact Algorithms for Maximum Clique: A Computational Study , 2012, Algorithms.

[50]  Philip S. Yu,et al.  Max-Clique: A Top-Down Graph-Based Approach to Frequent Pattern Mining , 2010, 2010 IEEE International Conference on Data Mining.

[51]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[52]  Afonso Ferreira,et al.  Complexity of Connected Components in Evolving Graphs and the Computation of Multicast Trees in Dynamic Networks , 2003, ADHOC-NOW.

[53]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[54]  Priya Mahadevan,et al.  The internet AS-level topology: three data sources and one definitive metric , 2005, Comput. Commun. Rev..

[55]  Richard M. Karp,et al.  A fast parallel algorithm for the maximal independent set problem , 1985, JACM.

[56]  Kumar Chellapilla,et al.  Speeding up algorithms on compressed web graphs , 2009, WSDM '09.

[57]  Guy Kortsarz,et al.  Generating Sparse 2-Spanners , 1994, J. Algorithms.

[58]  P. Erdos,et al.  On chromatic number of graphs and set-systems , 1966 .

[59]  Jacob Ratkiewicz,et al.  Political Polarization on Twitter , 2011, ICWSM.

[60]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[61]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, KDD 2012.

[62]  Ciro Cattuto,et al.  What's in a crowd? Analysis of face-to-face behavioral networks , 2010, Journal of theoretical biology.

[63]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[64]  Michael A. Langston,et al.  The Maximum Clique Enumeration Problem: Algorithms, Applications and Implementations , 2011, ISBRA.

[65]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[66]  Cecilia Mascolo,et al.  Components in time-varying graphs , 2011, Chaos.

[67]  David F. Gleich,et al.  Using Polynomial Chaos to Compute the Influence of Multiple Random Surfers in the PageRank Model , 2007, WAW.

[68]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[69]  David F. Gleich,et al.  Tracking the random surfer: empirically measured teleportation parameters in PageRank , 2010, WWW '10.

[70]  Bin Wu,et al.  Parallel Algorithm for Enumerating Maximal Cliques in Complex Network , 2009, Mining Complex Data.

[71]  Sebastiano Vigna,et al.  UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..

[72]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[73]  Sebastiano Vigna,et al.  Codes for the World Wide Web , 2005, Internet Math..

[74]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[75]  Cecilia Mascolo,et al.  Characterising temporal distance and reachability in mobile and online social networks , 2010, CCRV.

[76]  Etsuji Tomita,et al.  An Efficient Branch-and-bound Algorithm for Finding a Maximum Clique with Computational Experiments , 2001, J. Glob. Optim..