Incremental Lossless Graph Summarization

Given a fully dynamic graph, represented as a stream of edge insertions and deletions, how can we obtain and incrementally update a lossless summary of its current snapshot? As large-scale graphs are prevalent, concisely representing them is inevitable for efficient storage and analysis. Lossless graph summarization is an effective graph-compression technique with many desirable properties. It aims to compactly represent the input graph as (a) a summary graph consisting of supernodes (i.e., sets of nodes) and superedges (i.e., edges between supernodes), which provide a rough description, and (b) edge corrections which fix errors induced by the rough description. While a number of batch algorithms, suited for static graphs, have been developed for rapid and compact graph summarization, they are highly inefficient in terms of time and space for dynamic graphs, which are common in practice. In this work, we propose MoSSo, the first incremental algorithm for lossless summarization of fully dynamic graphs. In response to each change in the input graph, MoSSo updates the output representation by repeatedly moving nodes among supernodes. MoSSo decides nodes to be moved and their destinations carefully but rapidly based on several novel ideas. Through extensive experiments on 10 real graphs, we show MoSSo is (a) Fast and 'any time': processing each change in near-constant time (less than 0.1 millisecond), up to 7 orders of magnitude faster than running state-of-the-art batch methods, (b) Scalable: summarizing graphs with hundreds of millions of edges, requiring sub-linear memory during the process, and (c) Effective: achieving comparable compression ratios even to state-of-the-art batch methods.

[1]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[2]  Charu C. Aggarwal,et al.  Toward query-friendly compression of rapid graph streams , 2017, Social Network Analysis and Mining.

[3]  Francesco Bonchi,et al.  Graph summarization with quality guarantees , 2014, 2014 IEEE International Conference on Data Mining.

[4]  William Song,et al.  Streaming graph challenge: Stochastic block partition , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[5]  Osmar R. Zaïane,et al.  Incremental local community identification in dynamic social networks , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[6]  Charu C. Aggarwal,et al.  gSketch: On Query Estimation in Graph Streams , 2011, Proc. VLDB Endow..

[7]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[8]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[9]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[10]  Yasir Mehmood,et al.  CSI: Community-Level Social Influence Analysis , 2013, ECML/PKDD.

[11]  Evimaria Terzi,et al.  GraSS: Graph Structure Summarization , 2010, SDM.

[12]  Young-Koo Lee,et al.  Set-based approximate approach for lossless graph summarization , 2015, Computing.

[13]  Enrique Herrera-Viedma,et al.  An incremental method to detect communities in dynamic evolving social networks , 2019, Knowl. Based Syst..

[14]  Lei Zou,et al.  Fast and Accurate Graph Stream Summarization , 2018, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[15]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[16]  Ricardo Baeza-Yates,et al.  Scalable dynamic graph summarization , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[17]  E. Bell,et al.  The Iterated Exponential Integers , 1938 .

[18]  Aristides Gionis,et al.  Sparsification of influence networks , 2011, KDD.

[19]  Hema Raghavan,et al.  SWeG: Lossless and Lossy Summarization of Web-Scale Graphs , 2019, WWW.

[20]  Danai Koutra,et al.  TimeCrunch: Interpretable Dynamic Graph Summarization , 2015, KDD.

[21]  Kijung Shin,et al.  SSumM: Sparse Summarization of Massive Graphs , 2020, KDD.

[22]  Danai Koutra,et al.  Graph Summarization Methods and Applications: A Survey , 2016 .

[23]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[24]  Tiago P. Peixoto Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Guy E. Blelloch,et al.  Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+ , 2015, 2015 Data Compression Conference.

[26]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.

[27]  Giuseppe Ottaviano,et al.  Compressing Graphs and Indexes with Recursive Graph Bisection , 2016, KDD.

[28]  Gregory Buehrer,et al.  A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[29]  Qing Chen,et al.  Graph Stream Summarization: From Big Bang to Big Crunch , 2016, SIGMOD Conference.

[30]  Yihong Gong,et al.  Incremental Spectral Clustering With Application to Monitoring of Evolving Blog Communities , 2007, SDM.

[31]  Imdadullah Khan,et al.  Scalable Approximation Algorithm for Graph Summarization , 2018, PAKDD.

[32]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[33]  Hong Cheng,et al.  Clustering Large Attributed Graphs: An Efficient Incremental Approach , 2010, 2010 IEEE International Conference on Data Mining.

[34]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..