Incremental maintenance of maximal cliques in a dynamic graph

We consider the maintenance of the set of all maximal cliques in a dynamic graph that is changing through the addition or deletion of edges. We present nearly tight bounds on the magnitude of change in the set of maximal cliques when edges are added to the graph, as well as the first change-sensitive algorithm for incremental clique maintenance under edge additions, whose runtime is proportional to the magnitude of the change in the set of maximal cliques, when the number of edges added is small. Our algorithm can also be applied to the decremental case, when edges are deleted from the graph. We present experimental results showing these algorithms are efficient in practice and are faster than prior work by two to three orders of magnitude.

[1]  Odysseas Papapetrou,et al.  Sketch-based Querying of Distributed Sliding-Window Data Streams , 2012, Proc. VLDB Endow..

[2]  Mikkel Thorup,et al.  Decremental dynamic connectivity , 1997, SODA '97.

[3]  Kun-Lung Wu,et al.  Work-Efficient Parallel Union-Find with Applications to Incremental Graph Connectivity , 2016, Euro-Par.

[4]  Jeffrey Xu Yu,et al.  Efficient Core Maintenance in Large Dynamic Graphs , 2012, IEEE Transactions on Knowledge and Data Engineering.

[5]  Srikanta Tirthapura,et al.  Sketching asynchronous streams over a sliding window , 2006, PODC '06.

[6]  Chao Tian,et al.  Incremental Graph Computations: Doable and Undoable , 2017, SIGMOD Conference.

[7]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[8]  Sergei Vassilvitskii,et al.  Densest Subgraph in Streaming and MapReduce , 2012, Proc. VLDB Endow..

[9]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[10]  Srikanta Tirthapura,et al.  Estimating simple functions on the union of data streams , 2001, SPAA '01.

[11]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[12]  Jeffrey Xu Yu,et al.  Querying k-truss community in large and dynamic graphs , 2014, SIGMOD Conference.

[13]  Thomas Linke,et al.  Visualizing plant metabolomic correlation networks using clique-metabolite matrices , 2001, Bioinform..

[14]  Thorsten J. Ottosen,et al.  Honour Thy Neighbour: Clique Maintenance in Dynamic Graphs , 2010 .

[15]  David Eppstein,et al.  Listing All Maximal Cliques in Sparse Graphs in Near-optimal Time , 2010, Exact Complexity of NP-hard Problems.

[16]  Jeffrey Xu Yu,et al.  Finding maximal cliques in massive networks , 2011, TODS.

[17]  David P. Woodruff,et al.  When distributed computation is communication expensive , 2013, Distributed Computing.

[18]  David P. Woodruff,et al.  Tight bounds for distributed functional monitoring , 2011, STOC '12.

[19]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[20]  Ina Koch,et al.  Enumerating all connected maximal common subgraphs in two graphs , 2001, Theor. Comput. Sci..

[21]  J. G. Burleigh,et al.  Prospects for Building the Tree of Life from Large Sequence Databases , 2004, Science.

[22]  David P. Woodruff,et al.  A General Method for Estimating Correlated Aggregates over a Data Stream , 2012, ICDE.

[23]  J. G. Burleigh,et al.  Identifying optimal incomplete phylogenetic data sets from sequence databases. , 2005, Molecular phylogenetics and evolution.

[24]  S. Lehmann,et al.  Biclique communities. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Etsuji Tomita,et al.  A Much Faster Branch-and-Bound Algorithm for Finding a Maximum Clique , 2016, FAW.

[26]  David P. Woodruff,et al.  Optimal Random Sampling from Distributed Streams Revisited , 2011, DISC.

[27]  Rafail Ostrovsky,et al.  Optimal sampling from sliding windows , 2012, J. Comput. Syst. Sci..

[28]  Muhammad Aamir Cheema,et al.  Counting distinct objects over sliding windows , 2010, ADC.

[29]  Christopher Olston,et al.  Finding (recently) frequent items in distributed data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[30]  Kun-Lung Wu,et al.  Streaming Algorithms for k-core Decomposition , 2013, Proc. VLDB Endow..

[31]  Oliver Eulenstein,et al.  Obtaining maximal concatenated phylogenetic data sets from large sequence databases. , 2003, Molecular biology and evolution.

[32]  Wei Wang,et al.  Mining Maximal Cliques on Dynamic Graphs Efficiently by Local Strategies , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[33]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[34]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[35]  Sofya Vorotnikova,et al.  Densest Subgraph in Dynamic Graph Streams , 2015, MFCS.

[36]  Mihalis Yannakakis,et al.  On Generating All Maximal Independent Sets , 1988, Inf. Process. Lett..

[37]  Shinya Takahashi,et al.  A Simple and Faster Branch-and-Bound Algorithm for Finding a Maximum Clique , 2010, WALCOM.

[38]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[39]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[40]  Srikanta Tirthapura,et al.  A Change-Sensitive Algorithm for Maintaining Maximal Bicliques in a Dynamic Bipartite Graph , 2017, ArXiv.

[41]  Timothy W. Finin,et al.  Why We Twitter: An Analysis of a Microblogging Community , 2009, WebKDD/SNA-KDD.

[42]  Kazuhisa Makino,et al.  New Algorithms for Enumerating All Maximal Cliques , 2004, SWAT.

[43]  Annie Chateau,et al.  Approximate Common Intervals in Multiple Genome Comparison , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[44]  David Avis,et al.  Reverse Search for Enumeration , 1996, Discret. Appl. Math..

[45]  David Lo,et al.  Mining direct antagonistic communities in explicit trust networks , 2011, CIKM '11.

[46]  Divesh Srivastava,et al.  Dense subgraph maintenance under streaming edge weight updates for real-time story identification , 2012, The VLDB Journal.

[47]  Shuji Tsukiyama,et al.  A New Algorithm for Generating All the Maximal Independent Sets , 1977, SIAM J. Comput..

[48]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[49]  Santosh S. Vempala,et al.  Principal Component Analysis and Higher Correlations for Distributed Data , 2013, COLT.

[50]  Rajeev Motwani,et al.  Towards estimation error guarantees for distinct values , 2000, PODS.

[51]  Graham Cormode,et al.  What’s Different: Distributed, Continuous Monitoring of Duplicate-Resilient Aggregates on Data Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[52]  Aristides Gionis,et al.  Overlapping community detection in labeled graphs , 2014, Data Mining and Knowledge Discovery.

[53]  Bibudh Lahiri,et al.  Space-efficient tracking of persistent items in a massive data stream , 2011, DEBS '11.

[54]  Etsuji Tomita,et al.  An Efficient Branch-and-bound Algorithm for Finding a Maximum Clique with Computational Experiments , 2001, J. Glob. Optim..

[55]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[56]  Volker Stix,et al.  Finding All Maximal Cliques in Dynamic Graphs , 2004, Comput. Optim. Appl..

[57]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[58]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[59]  Srikanta Tirthapura,et al.  Enumerating Maximal Bicliques from a Large Graph Using MapReduce , 2017, IEEE Transactions on Services Computing.

[60]  B. Bollobás The evolution of random graphs , 1984 .

[61]  Ruixuan Li,et al.  Incremental K-clique clustering in dynamic social networks , 2012, Artificial Intelligence Review.

[62]  Christian Wulff-Nilsen,et al.  Faster Deterministic Fully-Dynamic Graph Connectivity , 2012, Encyclopedia of Algorithms.

[63]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[64]  Yong Guan,et al.  Detecting Click Fraud in Pay-Per-Click Streams of Online Advertising Networks , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[65]  Robert M. Haralick,et al.  Towards a Formal Concept Analysis Approach to Exploring Communities on the World Wide Web , 2005, ICFCA.

[66]  Divesh Srivastava,et al.  On computing correlated aggregates over continual data streams , 2001, SIGMOD '01.

[67]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[68]  Shifei Ding,et al.  Research of granular support vector machine , 2011, Artificial Intelligence Review.

[69]  J. Moon,et al.  On cliques in graphs , 1965 .

[70]  Konstantina Papagiannaki,et al.  Exploiting Temporal Persistence to Detect Covert Botnet Channels , 2009, RAID.

[71]  Srikanta Tirthapura,et al.  Mining maximal cliques from a large graph using MapReduce: Tackling highly uneven subproblem sizes , 2015, J. Parallel Distributed Comput..

[72]  Srikanta Tirthapura,et al.  Enumeration of Maximal Cliques from an Uncertain Graph , 2017, IEEE Transactions on Knowledge and Data Engineering.

[73]  David Eppstein,et al.  Listing All Maximal Cliques in Large Sparse Real-World Graphs , 2011, JEAL.

[74]  Timos K. Sellis,et al.  Window Specification over Data Streams , 2006, EDBT Workshops.

[75]  David Eyers,et al.  Living in the present: on-the-fly information processing in scalable web architectures , 2012, CloudCP '12.

[76]  Jeffrey F. Naughton,et al.  Sampling-Based Estimation of the Number of Distinct Values of an Attribute , 1995, VLDB.

[77]  Luca Trevisan,et al.  Counting Distinct Elements in a Data Stream , 2002, RANDOM.

[78]  Randy H. Katz,et al.  Chukwa: A System for Reliable Large-Scale Log Collection , 2010, LISA.

[79]  Qin Zhang,et al.  Optimal sampling from distributed streams , 2010, PODS '10.

[80]  Erik D. Demaine,et al.  Identifying frequent items in sliding windows over on-line packet streams , 2003, IMC '03.

[81]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[82]  Divyakant Agrawal,et al.  Medians and beyond: new aggregation techniques for sensor networks , 2004, SenSys '04.

[83]  Srikanta Tirthapura,et al.  Distributed Streams Algorithms for Sliding Windows , 2004, Theory of Computing Systems.

[84]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[85]  Bibudh Lahiri,et al.  Finding correlated heavy-hitters over data streams , 2009, 2009 IEEE 28th International Performance Computing and Communications Conference.