Community detection with nodal information: Likelihood and its variational approximation

Community detection is one of the fundamental problems in the study of network data. Most existing community detection approaches only consider edge information as inputs, and the output could be suboptimal when nodal information is available. In such cases, it is desirable to leverage nodal information for the improvement of community detection accuracy. Towards this goal, we propose a flexible network model incorporating nodal information and develop likelihood‐based inference methods. For the proposed methods, we establish favorable asymptotic properties as well as efficient algorithms for computation. Numerical experiments show the effectiveness of our methods in utilizing nodal information across a variety of simulated and real network data sets.

[1]  Ramesh Nallapati,et al.  Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs , 2021, ICWSM.

[2]  Volkan Cevher,et al.  Scalable Semidefinite Programming , 2019, SIAM J. Math. Data Sci..

[3]  Zuofeng Shang,et al.  Testing Community Structures for Hypergraphs , 2018, 1810.04617.

[4]  Yang Feng,et al.  PCABM: Pairwise Covariates-Adjusted Block Model for Community Detection* , 2018, Journal of the American Statistical Association.

[5]  Andrea Montanari,et al.  Contextual Stochastic Block Models , 2018, NeurIPS.

[6]  Laurent Massoulié,et al.  Efficient Inference in Stochastic Block Models With Vertex Labels , 2018, IEEE Transactions on Network Science and Engineering.

[7]  Purnamrita Sarkar,et al.  Covariate Regularized Community Detection in Sparse Graphs , 2016, Journal of the American Statistical Association.

[8]  Xiaodong Li,et al.  Convexified Modularity Maximization for Degree-corrected Stochastic Block Models , 2015, The Annals of Statistics.

[9]  Yuan Zhang,et al.  Community Detection in Networks with Node Features , 2015, Electronic Journal of Statistics.

[10]  Anderson Y. Zhang,et al.  Minimax Rates of Community Detection in Stochastic Block Models , 2015, ArXiv.

[11]  M. Newman,et al.  Structure and inference in annotated networks , 2015, Nature Communications.

[12]  Can M. Le,et al.  Estimating the number of communities in networks by spectral methods , 2015, ArXiv.

[13]  Chao Gao,et al.  Achieving Optimal Misclassification Proportion in Stochastic Block Models , 2015, J. Mach. Learn. Res..

[14]  Andrea Montanari,et al.  Semidefinite programs on sparse random graphs and their application to community detection , 2015, STOC.

[15]  M. Cugmas,et al.  On comparing partitions , 2015 .

[16]  Emmanuel Abbe,et al.  Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms , 2015, ArXiv.

[17]  P. Bickel,et al.  Likelihood-based model selection for stochastic block models , 2015, 1502.02069.

[18]  P. Bickel,et al.  Correction to the proof of consistency of community detection , 2015 .

[19]  Jing Lei A goodness-of-fit test for stochastic block models , 2014, 1412.4857.

[20]  Yuan Zhang,et al.  Detecting Overlapping Communities in Networks Using Spectral Methods , 2014, SIAM J. Math. Data Sci..

[21]  D. F. Saldana,et al.  How Many Communities Are There? , 2014, 1412.1684.

[22]  Roman Vershynin,et al.  Community detection in sparse networks via Grothendieck’s inequality , 2014, Probability Theory and Related Fields.

[23]  Joshua T. Vogelstein,et al.  Covariate-assisted spectral clustering , 2014, Biometrika.

[24]  Elizaveta Levina,et al.  On semidefinite relaxations for the block model , 2014, ArXiv.

[25]  Florent Krzakala,et al.  Spectral Clustering of graphs with the Bethe Hessian , 2014, NIPS.

[26]  Xiaodong Li,et al.  Robust and Computationally Feasible Community Detection in the Presence of Arbitrary Outlier Nodes , 2014, ArXiv.

[27]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[28]  Bin Yu,et al.  Impact of regularization on spectral clustering , 2013, 2014 Information Theory and Applications Workshop (ITA).

[29]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[30]  Tai Qin,et al.  Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel , 2013, NIPS.

[31]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[32]  Anima Anandkumar,et al.  A tensor approach to learning mixed membership community models , 2013, J. Mach. Learn. Res..

[33]  Srinivasan Parthasarathy,et al.  Efficient community detection in large networks using content and links , 2012, WWW.

[34]  Jiashun Jin,et al.  FAST COMMUNITY DETECTION BY SCORE , 2012, 1211.5803.

[35]  Sujay Sanghavi,et al.  Clustering Sparse Graphs , 2012, NIPS.

[36]  Peter J. Bickel,et al.  Pseudo-likelihood methods for community detection in large sparse networks , 2012, 1207.2340.

[37]  Xiangyu Chang,et al.  Asymptotic Normality of Maximum Likelihood and its Variational Approximation for Stochastic Blockmodels , 2012, ArXiv.

[38]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[39]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[40]  Ji Zhu,et al.  Consistency of community detection in networks under degree-corrected stochastic block models , 2011, 1110.3854.

[41]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  Neal Parikh,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[43]  Alain Celisse,et al.  Consistency of maximum-likelihood and variational estimators in the Stochastic Block Model , 2011, 1105.3288.

[44]  Edoardo M. Airoldi,et al.  Stochastic blockmodels with growing number of classes , 2010, Biometrika.

[45]  R. Keener Theoretical Statistics: Topics for a Core Course , 2010 .

[46]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[47]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[48]  Nitesh V. Chawla,et al.  Identifying and evaluating community structure in complex networks , 2010, Pattern Recognit. Lett..

[49]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[50]  Jonathan D. Chang,et al.  Hierarchical relational models for document networks , 2009, 0909.4331.

[51]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[52]  Franck Picard,et al.  A mixture model for random graphs , 2008, Stat. Comput..

[53]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[54]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Anirban Dasgupta,et al.  Spectral analysis of random graphs with skewed degree distributions , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[56]  Andrew Parker,et al.  The Hidden Power of Social Networks: Understanding How Work Really Gets Done in Organizations , 2004 .

[57]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[58]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[59]  Kim-Chuan Toh,et al.  Solving semidefinite-quadratic-linear programs using SDPT3 , 2003, Math. Program..

[60]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[61]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[62]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[63]  Christos Faloutsos,et al.  PICS: Parameter-free Identification of Cohesive Subgroups in Large Attributed Graphs , 2012, SDM.

[64]  Klaus J. Miescke,et al.  Statistical Decision Theory , 2007 .