A review of stochastic block models and extensions for graph clustering

There have been rapid developments in model-based clustering of graphs, also known as block modelling, over the last ten years or so. We review different approaches and extensions proposed for different aspects in this area, such as the type of the graph, the clustering approach, the inference approach, and whether the number of groups is selected or estimated. We also review models that combine block modelling with topic modelling and/or longitudinal modelling, regarding how these models deal with multiple types of data. How different approaches cope with various issues will be summarised and compared, to facilitate the demand of practitioners for a concise overview of the current status of these areas of literature.

[1]  Lars Kai Hansen,et al.  Infinite multiple membership relational modeling for complex networks , 2011, 2011 IEEE International Workshop on Machine Learning for Signal Processing.

[2]  David R. Hunter,et al.  Model-Based Clustering of Large Networks , 2012, The annals of applied statistics.

[3]  Tiago P. Peixoto Reconstructing networks with unknown and heterogeneous errors , 2018, Physical Review X.

[4]  Cristopher Moore,et al.  Detectability thresholds and optimal algorithms for community structure in dynamic networks , 2015, ArXiv.

[5]  Christopher C. Yang,et al.  Dynamic Community Detection with Temporal Dirichlet Process , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[6]  Ji Zhu,et al.  Link Prediction for Partially Observed Networks , 2013, ArXiv.

[7]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[8]  Agostino Nobile,et al.  Bayesian finite mixtures with an unknown number of components: The allocation sampler , 2007, Stat. Comput..

[9]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[10]  Saverio Ranciati,et al.  Identifying overlapping terrorist cells from the Noordin Top actor–event network , 2017, The Annals of Applied Statistics.

[11]  Christian Tallberg A BAYESIAN APPROACH TO MODELING STOCHASTIC BLOCKSTRUCTURES WITH COVARIATES , 2004 .

[12]  Aaron Clauset,et al.  Learning Latent Block Structure in Weighted Networks , 2014, J. Complex Networks.

[13]  Hui Xiong,et al.  Introduction to special section on intelligent mobile knowledge discovery and management systems , 2013, ACM Trans. Intell. Syst. Technol..

[14]  C YangChristopher,et al.  Detecting Social Media Hidden Communities Using Dynamic Stochastic Blockmodel with Temporal Dirichlet Process , 2014 .

[15]  David B. Dunson,et al.  The dynamic hierarchical Dirichlet process , 2008, ICML '08.

[16]  S. Boorman,et al.  Social Structure from Multiple Networks. I. Blockmodels of Roles and Positions , 1976, American Journal of Sociology.

[17]  Thomas L. Griffiths,et al.  The Indian Buffet Process: An Introduction and Review , 2011, J. Mach. Learn. Res..

[18]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[19]  L. Venkata Subramaniam,et al.  Using content and interactions for discovering communities in social networks , 2012, WWW.

[20]  Kenichi Kurihara,et al.  A Frequency-based Stochastic Blockmodel , 2006 .

[21]  M E Newman,et al.  Scientific collaboration networks. I. Network construction and fundamental results. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Toni Vallès-Català,et al.  Consistencies and inconsistencies between model selection and link prediction in networks. , 2017, Physical review. E.

[23]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[24]  Roger Guimerà,et al.  Accurate and scalable social recommendation using mixed-membership stochastic block models , 2016, Proceedings of the National Academy of Sciences.

[25]  Boleslaw K. Szymanski,et al.  Asymptotic resolution bounds of generalized modularity and statistically significant community detection , 2019, ArXiv.

[26]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[27]  Vincent Miele,et al.  Statistical clustering of temporal networks through a dynamic stochastic block model , 2015, 1506.07464.

[28]  Jennifer Jie Xu,et al.  Knowledge Discovery and Data Mining , 2014, Computing Handbook, 3rd ed..

[29]  Tiago P. Peixoto Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Padhraic Smyth,et al.  Stochastic blockmodeling of relational event dynamics , 2013, AISTATS.

[31]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[32]  P. Latouche,et al.  Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood , 2015 .

[33]  M. Newman Community detection in networks: Modularity optimization and maximum likelihood are equivalent , 2016, Physical review. E.

[34]  Alfred O. Hero,et al.  Dynamic Stochastic Blockmodels: Statistical Models for Time-Evolving Networks , 2013, SBP.

[35]  Charles Bouveyron Clustering of networks with textual edges , 2017 .

[36]  Zoubin Ghahramani,et al.  An Infinite Latent Attribute Model for Network Data , 2012, ICML.

[37]  Roger Guimerà,et al.  Tensorial and bipartite block models for link prediction in layered networks and temporal networks , 2018, Physical review. E.

[38]  Boleslaw K. Szymanski,et al.  A Regularized Stochastic Block Model for the robust community detection in complex networks , 2019, Scientific Reports.

[39]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[40]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[42]  Benjamin H. Good,et al.  Performance of modularity maximization in practical contexts. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  E. Xing,et al.  A state-space mixed membership blockmodel for dynamic network tomography , 2008, 0901.0135.

[44]  Xiao Zhang,et al.  Random graph models for dynamic networks , 2016, The European Physical Journal B.

[45]  Santo Fortunato,et al.  Limits of modularity maximization in community detection , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Chong Wang,et al.  Online Variational Inference for the Hierarchical Dirichlet Process , 2011, AISTATS.

[47]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[48]  Charles Bouveyron,et al.  The stochastic topic block model for the clustering of vertices in networks with textual edges , 2016, Statistics and Computing.

[49]  Thomas L. Griffiths,et al.  Nonparametric Latent Feature Models for Link Prediction , 2009, NIPS.

[50]  A. Banerjee,et al.  Social Topic Models for Community Extraction , 2008 .

[51]  Carl T. Bergstrom,et al.  The map equation , 2009, 0906.1405.

[52]  Konstantin Avrachenkov,et al.  Cooperative Game Theory Approaches for Network Partitioning , 2017, COCOON.

[53]  Marc Niethammer,et al.  Stochastic block models with multiple continuous attributes , 2018, Applied Network Science.

[54]  Jiashun Jin,et al.  Coauthorship and Citation Networks for Statisticians , 2014, ArXiv.

[55]  Christopher C. Yang,et al.  Detecting Social Media Hidden Communities Using Dynamic Stochastic Blockmodel with Temporal Dirichlet Process , 2014, TIST.

[56]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[57]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[58]  Tiago P. Peixoto,et al.  A network approach to topic models , 2017, Science Advances.

[59]  Carey E. Priebe,et al.  Statistical Inference on Errorfully Observed Graphs , 2012, 1211.3601.

[60]  P. Arabie,et al.  An algorithm for clustering relational data with applications to social network analysis and comparison with multidimensional scaling , 1975 .

[61]  Max Welling,et al.  Scalable MCMC for Mixed Membership Stochastic Blockmodels , 2015, AISTATS.

[62]  William W. Cohen,et al.  Proceedings of the 23rd international conference on Machine learning , 2006, ICML 2008.

[63]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[64]  A. Raftery,et al.  Model‐based clustering for social networks , 2007 .

[65]  Le Song,et al.  Dynamic mixed membership blockmodel for evolving networks , 2009, ICML '09.

[66]  Francesco Sanna Passino,et al.  Bayesian estimation of the latent dimension and communities in stochastic blockmodels , 2019, Statistics and Computing.

[67]  Roger Guimerà,et al.  Multilayer stochastic block models reveal the multilayer structure of complex networks , 2014, ArXiv.

[68]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[69]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[70]  Edoardo M. Airoldi,et al.  Geometric Representations of Random Hypergraphs , 2009 .

[71]  M. Narasimha Murty,et al.  On Finding the Natural Number of Topics with Latent Dirichlet Allocation: Some Observations , 2010, PAKDD.

[72]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[73]  Yee Whye Teh,et al.  Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes , 2004, NIPS.

[74]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[75]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[76]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[77]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[78]  David M. Blei,et al.  Efficient Online Inference for Bayesian Nonparametric Relational Models , 2013, NIPS.

[79]  Tiago P. Peixoto Nonparametric weighted stochastic block models. , 2017, Physical review. E.

[80]  Martin Rosvall,et al.  An information-theoretic framework for resolving community structure in complex networks , 2007, Proceedings of the National Academy of Sciences.

[81]  Boleslaw K. Szymanski,et al.  On community structure in complex networks: challenges and opportunities , 2019, Applied Network Science.

[82]  David M. Blei,et al.  Hierarchical relational models for document networks , 2009, 0909.4331.

[83]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[84]  Darren J. Wilkinson,et al.  A Social Network Analysis of Articles on Social Network Analysis , 2018, ArXiv.

[85]  Frans A. Oliehoek,et al.  The MADP Toolbox: An Open-Source Library for Planning and Learning in (Multi-)Agent Systems , 2015, AAAI Fall Symposia.

[86]  Roger Guimerà,et al.  Missing and spurious interactions and the reconstruction of complex networks , 2009, Proceedings of the National Academy of Sciences.

[87]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email , 2007, J. Artif. Intell. Res..

[88]  Eric P. Xing,et al.  Document hierarchies from text and links , 2012, WWW.

[89]  Longbing Cao,et al.  Copula Mixed-Membership Stochastic Blockmodel , 2016, IJCAI.

[90]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[91]  Dianhui Wang,et al.  AI 2011: Advances in Artificial Intelligence - 24th Australasian Joint Conference, Perth, Australia, December 5-8, 2011. Proceedings , 2011, Australasian Conference on Artificial Intelligence.

[92]  Hong Qin,et al.  Corrected Bayesian Information Criterion for Stochastic Block Models , 2016, Journal of the American Statistical Association.

[93]  Model-based clustering for random hypergraphs , 2018, 1808.05185.

[94]  Yihong Gong,et al.  Detecting communities and their evolutions in dynamic social networks—a Bayesian approach , 2011, Machine Learning.

[95]  Tiago P. Peixoto Nonparametric Bayesian inference of the microcanonical stochastic block model. , 2016, Physical review. E.

[96]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[97]  Charles Bouveyron,et al.  The dynamic stochastic topic block model for dynamic networks with textual edges , 2018, Statistics and Computing.

[98]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[99]  Morten Mørup,et al.  Bayesian Community Detection , 2012, Neural Computation.

[100]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[101]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[102]  Kohei Hayashi,et al.  A Tractable Fully Bayesian Method for the Stochastic Block Model , 2016, ArXiv.

[103]  Xiaoran Yan,et al.  Bayesian model selection of stochastic block models , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[104]  Mingyuan Zhou,et al.  Infinite Edge Partition Models for Overlapping Community Detection and Link Prediction , 2015, AISTATS.

[105]  C. Matias,et al.  Estimation and clustering in a semiparametric Poisson process stochastic block model for longitudinal networks , 2015 .

[106]  Franck Picard,et al.  A mixture model for random graphs , 2008, Stat. Comput..

[107]  Ken Wakita,et al.  Finding community structure in mega-scale social networks: [extended abstract] , 2007, WWW '07.

[108]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[109]  P. Bickel,et al.  Likelihood-based model selection for stochastic block models , 2015, 1502.02069.

[110]  Fabrice Rossi,et al.  Discovering patterns in time-varying graphs: a triclustering approach , 2015, Advances in Data Analysis and Classification.

[111]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[112]  Tiago P. Peixoto Hierarchical block structures and high-resolution model selection in large networks , 2013, ArXiv.

[113]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[114]  Eric P. Xing,et al.  Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream , 2010, UAI.

[115]  Michael J. Freedman,et al.  Scalable Inference of Overlapping Communities , 2012, NIPS.

[116]  Hongyuan Zha,et al.  Probabilistic models for discovering e-communities , 2006, WWW '06.

[117]  Peter Neal,et al.  Dynamic stochastic block models: parameter estimation and detection of changes in community structure , 2018, Stat. Comput..

[118]  Tiago P Peixoto,et al.  Parsimonious module inference in large networks. , 2012, Physical review letters.

[119]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[120]  Tiago P. Peixoto Inferring the mesoscale structure of layered, edge-valued, and time-varying networks. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[121]  Christophe Ambroise,et al.  Variational Bayesian inference and complexity control for stochastic block models , 2009, 0912.2873.

[122]  Boleslaw K. Szymanski,et al.  Community Detection via Maximization of Modularity and Its Variants , 2014, IEEE Transactions on Computational Social Systems.

[123]  Yan Liu,et al.  Topic-link LDA: joint models of topic and author community , 2009, ICML '09.

[124]  G. Roberts,et al.  Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models , 2007, 0710.4228.

[125]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[126]  Fabrice Rossi,et al.  A Triclustering Approach for Time Evolving Graphs , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[127]  Jean-Charles Delvenne,et al.  The many facets of community detection in complex networks , 2016, Applied Network Science.

[128]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[129]  Tiago P. Peixoto Model selection and hypothesis testing for large-scale network models with overlapping groups , 2014, ArXiv.

[130]  Neil J. Hurley,et al.  Computational Statistics and Data Analysis , 2022 .

[131]  Cristopher Moore,et al.  Model selection for degree-corrected block models , 2012, Journal of statistical mechanics.

[132]  Gesine Reinert,et al.  Estimating the number of communities in a network , 2016, Physical review letters.

[133]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[134]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[135]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[136]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[137]  Longbing Cao,et al.  Dynamic Infinite Mixed-Membership Stochastic Blockmodel , 2013, IEEE Transactions on Neural Networks and Learning Systems.