论文信息 - Universal Graph Compression: Stochastic Block Models

Universal Graph Compression: Stochastic Block Models

Motivated by the prevalent data science applications of processing large-scale graph data such as social networks, web graphs, and biological networks, as well as the high I/O and communication costs of storing and transmitting such data, this paper investigates universal compression of data appearing in the form of a labeled graph. In particular, we consider a widely used random graph model, stochastic block model (SBM), which captures the clustering effects in social networks. A universal graph compressor is proposed, which achieves the optimal compression rate for a wide family of SBMs with edge probabilities from $O$(1) to Ω(1/$n$ 2-∊) for any 0 < ∊ < 1. Existing universal compression techniques are developed mostly for stationary ergodic one-dimensional sequences with entropy linear in the number of variables. However, the adjacency matrix of SBM has complex two-dimensional correlations and sublinear entropy in the sparse regime. These challenges are alleviated through a carefully designed transform that converts two-dimensional correlated data into almost i.i.d. blocks. The blocks are then compressed by a Krichevsky-Trofimov compressor, whose length analysis is generalized to arbitrarily correlated processes with identical marginals.

[1] Tatsuya Akutsu,et al. Comparing biological networks via graph compression , 2010, BMC Systems Biology.

[2] J. Ian Munro,et al. Succinct encoding of arbitrary graphs , 2013, Theor. Comput. Sci..

[3] Wojciech Szpankowski,et al. Compression of Graphical Structures: Fundamental Limits, Algorithms, and Experiments , 2012, IEEE Transactions on Information Theory.

[4] Venkat Anantharam,et al. Universal lossless compression of graphical data , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[5] A. Frieze,et al. Introduction to Random Graphs , 2016 .

[6] Guy E. Blelloch,et al. Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+ , 2015, 2015 Data Compression Conference.

[7] Emmanuel Abbe,et al. Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[8] M. Narasimha Murty,et al. Structural Neighborhood Based Classification of Nodes in a Network , 2016, KDD.

[9] Huan Liu,et al. Relational learning via latent social dimensions , 2009, KDD.

[10] Moni Naor. Succinct representation of general unlabeled graphs , 1990, Discret. Appl. Math..

[11] Gonzalo Navarro,et al. Compressing Web Graphs like Texts ∗ , 2007 .

[12] Thomas C. Conway,et al. Succinct data structures for assembling large genomes , 2010, Bioinform..

[13] F. Alajaji,et al. Lectures Notes in Information Theory , 2000 .

[14] Abraham Lempel,et al. Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[15] Gonzalo Navarro,et al. k2-Trees for Compact Web Graph Representation , 2009, SPIRE.

[16] Emmanuel Abbe,et al. Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[17] György Turán,et al. On the succinct representation of graphs , 1984, Discret. Appl. Math..

[18] Emmanuel Abbe,et al. Graph compression: The effect of clusters , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[19] Abraham Lempel,et al. Compression of two-dimensional data , 1986, IEEE Trans. Inf. Theory.

[20] Jure Leskovec,et al. node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[21] Silvio Lattanzi,et al. On compressing social networks , 2009, KDD.

[22] Sebastiano Vigna,et al. The webgraph framework I: compression techniques , 2004, WWW '04.

[23] Kunihiko Sadakane,et al. New text indexing functionalities of the compressed suffix arrays , 2003, J. Algorithms.

[24] Michelle Effros,et al. Universal lossless source coding with the Burrows Wheeler transform , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[25] Andrew R. Barron,et al. Asymptotic minimax regret for data compression, gambling, and prediction , 1997, IEEE Trans. Inf. Theory.

[26] Andrew R. Barron,et al. Minimax redundancy for the class of memoryless sources , 1997, IEEE Trans. Inf. Theory.

[27] Elchanan Mossel,et al. Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[28] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[29] Frans M. J. Willems,et al. The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[30] Torsten Hoefler,et al. Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations , 2018, ArXiv.

[31] Ryan A. Rossi,et al. GraphZIP: a clique-based sparse graph compression method , 2018, Journal of Big Data.

[32] Sergio Verdú,et al. Compressing data on graphs with clusters , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[33] Heiko Schwarz,et al. Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[34] A. Rinaldo,et al. Random networks, graphical models and exchangeability , 2017, 1701.08420.

[35] Christos Faloutsos,et al. SlashBurn: Graph Compression and Mining beyond Caveman Communities , 2014, IEEE Transactions on Knowledge and Data Engineering.

[36] Guy E. Blelloch,et al. Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.