Distributed Maximal Clique Computation and Management

Maximal cliques are elementary substructures in a graph and instrumental in graph analysis such as the structural analysis of many complex networks, graph clustering and community detection, network hierarchy detection, emerging pattern mining, vertex importance measures, etc. However, the number of maximal cliques is also notoriously large even for many small real world graphs. This size problem gives rise to challenges in both computing and managing the set of maximal cliques. Many algorithms for computing maximal cliques have been proposed in the literature; however, most of them are sequential algorithms that cannot scale due to the high complexity of the problem, while existing parallel algorithms for computing maximal cliques are mostly immature and especially suffer from skewed workload. As for managing the set of maximal cliques, which is essential due to its large size, there is barely any efficient method for querying or updating the set of maximal cliques. In this paper, we first propose a distributed algorithm built on a share-nothing architecture for computing the set of maximal cliques. We effectively address the problem of skewed workload distribution due to high-degree vertices, which also leads to drastically reduced worst-case time complexity for computing maximal cliques in common real-world graphs. Then, we propose a set of fundamental query operations and efficient algorithms to process the queries, to aid more efficient and effective analysis of the set of maximal cliques. Finally, we also devise algorithms to support efficient update maintenance of the set of maximal cliques when the underlying graph is updated. We verify the efficiency of our algorithms for computing, querying, and updating the set of maximal cliques with a range of real-world graphs from different application domains.

[1]  Bin Wu,et al.  Parallel Algorithm for Enumerating Maximal Cliques in Complex Network , 2009, Mining Complex Data.

[2]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[3]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[4]  Thomas Linke,et al.  Visualizing plant metabolomic correlation networks using clique-metabolite matrices , 2001, Bioinform..

[5]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[6]  Salvatore J. Stolfo,et al.  Segmentation and Automated Social Hierarchy Detection through Email Network Analysis , 2009, WebKDD/SNA-KDD.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  James Cheng,et al.  Fast algorithms for maximal clique enumeration with limited memory , 2012, KDD.

[9]  Robert L. Grossman,et al.  dMaximalCliques: A Distributed Algorithm for Enumerating All Maximal Cliques and Maximal Clique Distribution , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[10]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[11]  Jia Wang,et al.  Redundancy-aware maximal cliques , 2013, KDD.

[12]  Frédéric Cazals,et al.  A note on the problem of reporting maximal cliques , 2008, Theor. Comput. Sci..

[13]  N. Samatova,et al.  On the Relative Efficiency of Maximal Clique Enumeration Algorithms , with Application to High-Throughput Computational Biology , 2005 .

[14]  Volker Stix,et al.  Finding All Maximal Cliques in Dynamic Graphs , 2004, Comput. Optim. Appl..

[15]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[16]  Jeffrey Xu Yu,et al.  Finding maximal cliques in massive networks by H*-graph , 2010, SIGMOD Conference.

[17]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[18]  P. Killworth,et al.  Informant accuracy in social network data IV: a comparison of clique-level structure in behavioral and cognitive network data , 1979 .

[19]  Shuji Tsukiyama,et al.  A New Algorithm for Generating All the Maximal Independent Sets , 1977, SIAM J. Comput..

[20]  Bin Wu,et al.  A New Algorithm for Enumerating All Maximal Cliques in Complex Network , 2006, ADMA.

[21]  Panos M. Pardalos,et al.  Statistical analysis of financial networks , 2005, Comput. Stat. Data Anal..

[22]  Nagiza F. Samatova,et al.  A scalable, parallel algorithm for maximal clique enumeration , 2009, J. Parallel Distributed Comput..

[23]  Marek Chrobak,et al.  Planar Orientations with Low Out-degree and Compaction of Adjacency Matrices , 1991, Theor. Comput. Sci..

[24]  Michael T. Goodrich,et al.  External-Memory Network Analysis Algorithms for Naturally Sparse Graphs , 2011, ESA.

[25]  Albert-László Barabási,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW , 2004 .

[26]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[27]  Panos M. Pardalos,et al.  The maximum clique problem , 1994, J. Glob. Optim..

[28]  Ina Koch,et al.  Enumerating all connected maximal common subgraphs in two graphs , 2001, Theor. Comput. Sci..

[29]  Ashraf Aboulnaga,et al.  Scalable maximum clique computation using MapReduce , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[30]  J. Moon,et al.  On cliques in graphs , 1965 .

[31]  Bin Wu,et al.  A Distributed Algorithm to Enumerate All Maximal Cliques in MapReduce , 2009, 2009 Fourth International Conference on Frontier of Computer Science and Technology.

[32]  E. A. Akkoyunlu,et al.  The Enumeration of Maximal Cliques of Large Graphs , 1973, SIAM J. Comput..

[33]  Kazuhisa Makino,et al.  New Algorithms for Enumerating All Maximal Cliques , 2004, SWAT.

[34]  Natwar Modani,et al.  Large Maximal Cliques Enumeration in Large Sparse Graphs , 2009, COMAD.

[35]  James Cheng,et al.  Distributed Maximal Clique Computation , 2014, 2014 IEEE International Congress on Big Data.

[36]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[37]  David Eppstein,et al.  Listing All Maximal Cliques in Large Sparse Real-World Graphs , 2011, JEAL.

[38]  Natwar Modani,et al.  Large maximal cliques enumeration in sparse graphs , 2008, CIKM '08.

[39]  David Eppstein,et al.  Listing All Maximal Cliques in Sparse Graphs in Near-optimal Time , 2010, Exact Complexity of NP-hard Problems.