Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models

We present an efficient algorithm for the inference of stochastic block models in large networks. The algorithm can be used as an optimized Markov chain Monte Carlo (MCMC) method, with a fast mixing time and a much reduced susceptibility to getting trapped in metastable states, or as a greedy agglomerative heuristic, with an almost linear O(Nln2N) complexity, where N is the number of nodes in the network, independent of the number of blocks being inferred. We show that the heuristic is capable of delivering results which are indistinguishable from the more exact and numerically expensive MCMC method in many artificial and empirical networks, despite being much faster. The method is entirely unbiased towards any specific mixing pattern, and in particular it does not favor assortative community structures.

[1]  October I Physical Review Letters , 2022 .

[2]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.

[3]  Journal of Chemical Physics , 1932, Nature.

[4]  Andrew Sears,et al.  Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , 2002, CHI 2002.

[5]  Dunja Mladenic,et al.  Proceedings of the 3rd international workshop on Link discovery , 2005, KDD 2005.

[6]  R. Fildes Journal of the American Statistical Association : William S. Cleveland, Marylyn E. McGill and Robert McGill, The shape parameter for a two variable graph 83 (1988) 289-300 , 1989 .

[7]  Ulrik Brandes,et al.  Social Networks , 2013, Handbook of Graph Drawing and Visualization.

[8]  A. Châtelain,et al.  The European Physical Journal D , 1999 .

[9]  Scott P. Robertson,et al.  Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , 1991 .

[10]  J. Thomson,et al.  Philosophical Magazine , 1945, Nature.

[11]  O. William Journal Of The American Statistical Association V-28 , 1932 .

[12]  P. Latouche,et al.  Model selection and clustering in stochastic block models with the exact integrated complete data likelihood , 2013, 1303.2962.

[13]  J. Herskowitz,et al.  Proceedings of the National Academy of Sciences, USA , 1996, Current Biology.

[14]  Chid Apte,et al.  Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011 , 2011, KDD.

[15]  John Eccleston,et al.  Statistics and Computing , 2006 .

[16]  Tiago P. Peixoto Hierarchical block structures and high-resolution model selection in large networks , 2013, ArXiv.

[17]  Peter F. Patel-Schneider,et al.  Proceedings of the 16th international conference on World Wide Web , 2007, WWW 2007.

[18]  Rajen Dinesh Shah,et al.  Statistical modelling , 2015 .

[19]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.