Regular Decomposition of Large Graphs: Foundation of a Sampling Approach to Stochastic Block Model Fitting

We analyze the performance of regular decomposition, a method for compression of large and dense graphs. This method is inspired by Szemerédi’s regularity lemma (SRL), a generic structural result of large and dense graphs. In our method, stochastic block model (SBM) is used as a model in maximum likelihood fitting to find a regular structure similar to the one predicted by SRL. Another ingredient of our method is Rissanen’s minimum description length principle (MDL). We consider scaling of algorithms to extremely large size of graphs by sampling a small subgraph. We continue our previous work on the subject by proving some experimentally found claims. Our theoretical setting does not assume that the graph is generated from a SBM. The task is to find a SBM that is optimal for modeling the given graph in the sense of MDL. This assumption matches with real-life situations when no random generative model is appropriate. Our aim is to show that regular decomposition is a viable and robust method for large graphs emerging, say, in Big Data area.

[1]  C. Borgs,et al.  Iterative Collaborative Filtering for Sparse Matrix Estimation , 2017, Operations Research.

[2]  Marianna Bolla,et al.  Spectral Clustering and Biclustering , 2013 .

[3]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[4]  Marco Fiorucci,et al.  Analysis of large sparse graphs using regular decomposition of graph distance matrices , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[5]  Thomas P. Hayes A large-deviation inequality for vector-valued martingales , 2003 .

[6]  Ilkka Norros,et al.  Regular Decomposition: an information and graph theoretic approach to stochastic block models , 2017, ArXiv.

[7]  Marianna Bolla,et al.  Relating multiway discrepancy and singular values of nonnegative rectangular matrices , 2016, Discret. Appl. Math..

[8]  Yufei Zhao,et al.  On Regularity Lemmas and their Algorithmic Applications , 2017, Comb. Probab. Comput..

[9]  E. Szemerédi Regular Partitions of Graphs , 1975 .

[10]  M. Bolla Spectral Clustering and Biclustering: Learning Large Graphs and Contingency Tables , 2013 .

[11]  Mikhail Belkin,et al.  Consistency of spectral clustering , 2008, 0804.0678.

[12]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[13]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[14]  Gábor E. Tusnády,et al.  Reconstructing Cortical Networks: Case of Directed Graphs with High Level of Reciprocity , 2008 .

[15]  Tiago P Peixoto,et al.  Parsimonious module inference in large networks. , 2012, Physical review letters.

[16]  Emily B. Fox,et al.  Sparse graphs using exchangeable random measures , 2014, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[17]  Ilkka Norros,et al.  Regular decomposition of large graphs and other structures: Scalability and robustness towards missing data , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[18]  Hannu Reittu,et al.  Regular Decomposition of Multivariate Time Series and Other Matrices , 2014, S+SSPR.

[19]  Hannu Reittu,et al.  Szemerédi-type clustering of peer-to-peer streaming system , 2011, Cnet@ITC.

[20]  Svante Janson,et al.  Random graphs , 2000, Wiley-Interscience series in discrete mathematics and optimization.

[21]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[23]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .