Manifold Regularized Stochastic Block Model

Stochastic block models (SBMs) play essential roles in network analysis, especially in those related to unsupervised learning (clustering). Many SBM-based approaches have been proposed to uncover network clusters, by means of maximizing the block-wise posterior probability that generates edges bridging vertices. However, none of them is capable of inferring the cluster preference for each vertex through simultaneously modeling block-wise edge structure, vertex features, and similarities between pairwise vertices. To fill this void, we propose a novel SBM dubbed manifold regularized stochastic model (MrSBM) to perform the task of unsupervised learning in network data in this paper. Besides modeling edges that are within or connecting blocks, MrSBM also considers modeling vertex features utilizing the probabilities of vertex-cluster preference and feature-cluster contribution. In addition, MrSBM attempts to generate manifold similarity of pairwise vertices utilizing the inferred vertex-cluster preference. As a result, the inference of cluster preference may well capture the comparability in the manifold. We design a novel process for network data generation, based on which, we specify the model structure and formulate the network clustering problem using a novel likelihood function. To guarantee MrSBM learns the optimal cluster preference for each vertex, we derive an effective Expectation-Maximization based algorithm for model fitting. MrSBM has been tested on five sets of real-world network data and has been compared with both classical and state-of-the-art approaches to network clustering. The competitive experimental results validate the effectiveness of MrSBM.

[1]  Yan Zhang,et al.  Semi-supervised local multi-manifold Isomap by linear embedding for feature extraction , 2018, Pattern Recognit..

[2]  Keith C. C. Chan,et al.  Evolutionary Graph Clustering for Protein Complex Identification , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Keith C. C. Chan,et al.  Discovering Fuzzy Structural Patterns for Graph Analytics , 2018, IEEE Transactions on Fuzzy Systems.

[4]  Xiaochun Cao,et al.  Semantic Community Identification in Large Attribute Networks , 2016, AAAI.

[5]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Mason A. Porter,et al.  Comparing Community Structure to Characteristics in Online Collegiate Social Networks , 2008, SIAM Rev..

[7]  Dorothy M. Fragaszy,et al.  On the relation between social dynamics and social learning , 1995, Animal Behaviour.

[8]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[10]  Hong Cheng,et al.  GBAGC: A General Bayesian Framework for Attributed Graph Clustering , 2014, TKDD.

[11]  Keith C. C. Chan,et al.  Measuring Boundedness for Protein Complex Identification in PPI Networks , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Zhao Zhang,et al.  Lp- and Ls-Norm Distance Based Robust Linear Discriminant Analysis , 2018, Neural Networks.

[13]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[14]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Tommy W. S. Chow,et al.  Trace Ratio Optimization-Based Semi-Supervised Nonlinear Dimensionality Reduction for Marginal Manifold Visualization , 2013, IEEE Transactions on Knowledge and Data Engineering.

[17]  Jun Yu,et al.  Adapting Stochastic Block Models to Power-Law Degree Distributions , 2019, IEEE Transactions on Cybernetics.

[18]  Keith C. C. Chan,et al.  MISAGA: An Algorithm for Mining Interesting Subgraphs in Attributed Graphs , 2018, IEEE Transactions on Cybernetics.

[19]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[20]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[21]  Weiwei Liu,et al.  Discrete Network Embedding , 2018, IJCAI.

[22]  L. Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[23]  Jure Leskovec,et al.  Discovering social circles in ego networks , 2012, ACM Trans. Knowl. Discov. Data.

[24]  Tobey H Ko,et al.  Contextual Correlation Preserving Multiview Featured Graph Clustering , 2020, IEEE Transactions on Cybernetics.

[25]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[26]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[27]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Chengqi Zhang,et al.  Combining Structured Node Content and Topology Information for Networked Graph Clustering , 2017, ACM Trans. Knowl. Discov. Data.

[30]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[31]  Jure Leskovec,et al.  Detecting cohesive and 2-mode communities indirected and undirected networks , 2014, WSDM.