A Scalable Redefined Stochastic Blockmodel

Stochastic blockmodel (SBM) is a widely used statistical network representation model, with good interpretability, expressiveness, generalization, and flexibility, which has become prevalent and important in the field of network science over the last years. However, learning an optimal SBM for a given network is an NP-hard problem. This results in significant limitations when it comes to applications of SBMs in large-scale networks, because of the significant computational overhead of existing SBM models, as well as their learning methods. Reducing the cost of SBM learning and making it scalable for handling large-scale networks, while maintaining the good theoretical properties of SBM, remains an unresolved problem. In this work, we address this challenging task from a novel perspective of model redefinition. We propose a novel redefined SBM with Poisson distribution and its block-wise learning algorithm that can efficiently analyse large-scale networks. Extensive validation conducted on both artificial and real-world data shows that our proposed method significantly outperforms the state-of-the-art methods in terms of a reasonable trade-off between accuracy and scalability.1

[1]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[2]  Christophe Ambroise,et al.  Variational Bayesian inference and complexity control for stochastic block models , 2009, 0912.2873.

[3]  Thomas Bonald,et al.  A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks , 2015, ALT.

[4]  Ludmila I. Kuncheva,et al.  Using diversity in cluster ensembles , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[5]  Weiqing Wang,et al.  Social Boosted Recommendation With Folded Bipartite Network Embedding , 2020, IEEE Transactions on Knowledge and Data Engineering.

[6]  C. Altafini,et al.  Computing global structural balance in large-scale signed social networks , 2011, Proceedings of the National Academy of Sciences.

[7]  H. Dawah,et al.  Structure of the parasitoid communities of grass-feeding chalcid wasps , 1995 .

[8]  Tiago P. Peixoto Nonparametric weighted stochastic block models. , 2017, Physical review. E.

[9]  Mark E. J. Newman,et al.  An efficient and principled method for detecting communities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[11]  Bo Yang,et al.  Stochastic Blockmodeling and Variational Bayes Learning for Signed Network Analysis , 2017, IEEE Transactions on Knowledge and Data Engineering.

[12]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[13]  Zhihua Zhang,et al.  A scalable community detection algorithm for large graphs using stochastic block models , 2015, Intell. Data Anal..

[14]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[15]  Cristopher Moore,et al.  Community detection, link prediction, and layer interdependence in multilayer networks , 2017, Physical review. E.

[16]  Tiago P. Peixoto Model selection and hypothesis testing for large-scale network models with overlapping groups , 2014, ArXiv.

[17]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[18]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[19]  E A Leicht,et al.  Mixture models and exploratory analysis in networks , 2006, Proceedings of the National Academy of Sciences.

[20]  Jonathan Qiang Jiang,et al.  Stochastic Blockmodel and Exploratory Analysis in Signed Networks , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[22]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  A. Lanterman Schwarz, Wallace, and Rissanen: Intertwining Themes in Theories of Model Selection , 2001 .

[24]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[25]  Michael J. Freedman,et al.  Scalable Inference of Overlapping Communities , 2012, NIPS.

[26]  Wenjun Wang,et al.  Layer Clustering-Enhanced Stochastic Block Model for Community Detection in Multiplex Networks , 2018, Advances in Intelligent Systems and Computing.

[27]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[28]  Marianna Pensky,et al.  Spectral clustering in the dynamic stochastic block model , 2017, Electronic Journal of Statistics.

[29]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[30]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32]  Lawrence Carin,et al.  Stochastic Blockmodels meet Graph Neural Networks , 2019, ICML.

[33]  Vincent Miele,et al.  Statistical clustering of temporal networks through a dynamic stochastic block model , 2015, 1506.07464.

[34]  Roger Guimerà,et al.  Missing and spurious interactions and the reconstruction of complex networks , 2009, Proceedings of the National Academy of Sciences.

[35]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[36]  Morten Mørup,et al.  Nonparametric Bayesian modeling of complex networks: an introduction , 2013, IEEE Signal Processing Magazine.

[37]  Katarzyna Musial,et al.  Semi-supervised stochastic blockmodel for structure analysis of signed networks , 2020, Knowl. Based Syst..

[38]  Tiago P Peixoto,et al.  Parsimonious module inference in large networks. , 2012, Physical review letters.

[39]  Dayou Liu,et al.  Characterizing and Extracting Multiplex Patterns in Complex Networks , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[40]  Bo Yang,et al.  Efficiently and Fast Learning a Fine-grained Stochastic Blockmodel from Large Networks , 2014, PAKDD.

[41]  Huan Liu,et al.  Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[42]  David M Blei,et al.  Efficient discovery of overlapping communities in massive networks , 2013, Proceedings of the National Academy of Sciences.

[43]  Clara Pizzuti,et al.  Is normalized mutual information a fair measure for comparing community detection methods? , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[44]  Purnamrita Sarkar,et al.  On Mixed Memberships and Symmetric Nonnegative Matrix Factorizations , 2016, ICML.

[45]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[46]  Weixiong Zhang,et al.  Network-Specific Variational Auto-Encoder for Embedding in Attribute Networks , 2019, IJCAI.

[47]  Gesine Reinert,et al.  Efficient method for estimating the number of communities in a network , 2017, Physical review. E.

[48]  M. Newman Communities, modules and large-scale structure in networks , 2011, Nature Physics.

[49]  Bo Yang,et al.  On the Scalable Learning of Stochastic Blockmodel , 2015, AAAI.

[50]  Yihong Gong,et al.  Detecting communities and their evolutions in dynamic social networks—a Bayesian approach , 2011, Machine Learning.

[51]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[52]  Chris H Wiggins,et al.  Bayesian approach to network modularity. , 2007, Physical review letters.

[53]  Franck Picard,et al.  A mixture model for random graphs , 2008, Stat. Comput..

[54]  Katarzyna Musial,et al.  Multi-level Graph Convolutional Networks for Cross-platform Anchor Link Prediction , 2020, KDD.