Graph compression: The effect of clusters

This paper investigates the fundamental limits for compressing random graphs. It also discusses the compression of data on graphs. The graphs are assumed to have labelled vertices. The most basic example is the Erdős-Rényi model, which corresponds to the well-understood case of compressing i.i.d. bits. This paper investigates inhomogeneous random graphs that have clusters, or equivalently, block models. These correspond to mixtures of Erdős-Rényi models for which basic tools for i.i.d.-like sources may not apply, capturing a key feature of general network models. It is shown how the fundamental limit of lossless compression for such models takes different forms as the graph gets sparser. The paper also introduces connections between compression and clustering, how each field can impact the other, and how clustering can help for the compression of data on graphs.

[1]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[2]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[3]  Jess Banks,et al.  Information-theoretic thresholds for community detection in sparse networks , 2016, COLT.

[4]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[5]  Emmanuel Abbe,et al.  Community detection and the stochastic block model : recent developments , 2016 .

[6]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[7]  Paul Erdös,et al.  On random graphs, I , 1959 .

[8]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Moni Naor Succinct representation of general unlabeled graphs , 1990, Discret. Appl. Math..

[10]  Emmanuel Abbe,et al.  Achieving the KS threshold in the general stochastic block model with linearized acyclic belief propagation , 2016, NIPS.

[11]  Micah Adler,et al.  Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.

[12]  Erdal Arikan,et al.  Source polarization , 2010, 2010 IEEE International Symposium on Information Theory.

[13]  Sebastiano Vigna,et al.  The Webgraph framework II: codes for the World-Wide Web , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.

[14]  Andrea Montanari,et al.  Asymptotic Mutual Information for the Two-Groups Stochastic Block Model , 2015, ArXiv.

[15]  Martin E. Dyer,et al.  The Solution of Some Random NP-Hard Problems in Polynomial Expected Time , 1989, J. Algorithms.

[16]  Sergio Verdú,et al.  The role of the asymptotic equipartition property in noiseless source coding , 1997, IEEE Trans. Inf. Theory.

[17]  Rajeev Motwani,et al.  Clique partitions, graph compression and speeding-up algorithms , 1991, STOC '91.

[18]  Emmanuel Abbe,et al.  Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap , 2015, ArXiv.

[19]  Laurent Massoulié,et al.  Non-backtracking Spectrum of Random Graphs: Community Detection and Non-regular Ramanujan Graphs , 2014, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[20]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[21]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[22]  E. Ziv,et al.  Information-theoretic approach to network modularity. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Gregory Buehrer,et al.  A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[24]  Emmanuel Abbe,et al.  Crossing the KS threshold in the stochastic block model with information theory , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[25]  György Turán,et al.  On the succinct representation of graphs , 1984, Discret. Appl. Math..

[26]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[27]  E. Trucco A note on the information content of graphs , 1956 .

[28]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.

[29]  Andrei Z. Broder,et al.  The Connectivity Server: Fast Access to Linkage Information on the Web , 1998, Comput. Networks.

[30]  Wojciech Szpankowski,et al.  Compression of Graphical Structures: Fundamental Limits, Algorithms, and Experiments , 2012, IEEE Transactions on Information Theory.