Detecting Overlapping and Correlated Communities without Pure Nodes: Identifiability and Algorithm

Many machine learning problems come in the form of networks with relational data between entities, and one of the key unsupervised learning tasks is to detect communities in such a network. We adopt the mixed-membership stochastic blockmodel as the underlying probabilistic model, and give conditions under which the memberships of a subset of nodes can be uniquely identified. Our method starts by constructing a second-order graph moment, which can be shown to converge to a specific product of the true parameters as the size of the network increases. To correctly recover the true membership parameters, we formulate an optimization problem using insights from convex geometry. We show that if the true memberships satisfy a so-called sufficiently scattered condition, then solving the proposed problem correctly identifies the ground truth. We also propose an efficient algorithm for detecting communities, which is significantly faster than prior work and with better convergence properties. Experiments on synthetic and real data justify the validity of the proposed learning framework for network data.

[1]  Jianbo Shi,et al.  Learning Segmentation by Random Walks , 2000, NIPS.

[2]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[3]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[4]  Nikos D. Sidiropoulos,et al.  Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[5]  Sanjeev Arora,et al.  Learning Topic Models -- Going beyond SVD , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[6]  Michael J. Freedman,et al.  Scalable Inference of Overlapping Communities , 2012, NIPS.

[7]  Yuan Zhang,et al.  Detecting Overlapping Communities in Networks Using Spectral Methods , 2014, SIAM J. Math. Data Sci..

[8]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[9]  Nikos D. Sidiropoulos,et al.  Non-Negative Matrix Factorization Revisited: Uniqueness and Algorithm for Symmetric Decomposition , 2014, IEEE Transactions on Signal Processing.

[10]  Chong-Yung Chi,et al.  A Convex Analysis Framework for Blind Separation of Non-Negative Sources , 2008, IEEE Transactions on Signal Processing.

[11]  Asa Packer NP -Hardness of Largest Contained and Smallest Containing Simplices for V- and H-Polytopes , 2002, Discret. Comput. Geom..

[12]  Anima Anandkumar,et al.  A tensor approach to learning mixed membership community models , 2013, J. Mach. Learn. Res..

[13]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .

[14]  Jiashun Jin,et al.  Estimating network memberships by simplex vertex hunting , 2017 .

[15]  P. Bickel,et al.  The method of moments and degree distributions for network models , 2011, 1202.5101.

[16]  Chong-Yung Chi,et al.  A Convex Analysis-Based Minimum-Volume Enclosing Simplex Algorithm for Hyperspectral Unmixing , 2009, IEEE Transactions on Signal Processing.

[17]  Eric P. Xing,et al.  Discrete Temporal Models of Social Networks , 2006, SNA@ICML.

[18]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[19]  Wing-Kin Ma,et al.  Nonnegative Matrix Factorization for Signal and Data Analytics: Identifiability, Algorithms, and Applications , 2018, IEEE Signal Processing Magazine.

[20]  Nikos D. Sidiropoulos,et al.  Principled Neuro-Functional Connectivity Discovery , 2015, SDM.

[21]  Antonio J. Plaza,et al.  A Signal Processing Perspective on Hyperspectral Unmixing: Insights from Remote Sensing , 2014, IEEE Signal Processing Magazine.

[22]  Purnamrita Sarkar,et al.  On Mixed Memberships and Symmetric Nonnegative Matrix Factorizations , 2016, ICML.

[23]  Sanjeev Arora,et al.  A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.

[24]  Victoria Stodden,et al.  When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[25]  Nikos D. Sidiropoulos,et al.  Blind Separation of Quasi-Stationary Sources: Exploiting Convex Geometry in Covariance Domain , 2015, IEEE Transactions on Signal Processing.

[26]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[27]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[28]  Thomas Bonald,et al.  A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks , 2015, ALT.

[29]  Purnamrita Sarkar,et al.  Overlapping Clustering Models, and One (class) SVM to Bind Them All , 2018, NeurIPS.

[30]  José M. Bioucas-Dias,et al.  Vertex component analysis: a fast algorithm to unmix hyperspectral data , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[31]  Nikos D. Sidiropoulos,et al.  Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm , 2016, NIPS.

[32]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[33]  Nikos D. Sidiropoulos,et al.  Learning Hidden Markov Models from Pairwise Co-occurrences with Applications to Topic Modeling , 2018, ICML.

[34]  Wei-Chiang Li,et al.  Identifiability of the Simplex Volume Minimization Criterion for Blind Hyperspectral Unmixing: The No-Pure-Pixel Case , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[35]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[36]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[37]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[38]  Maxim Panov,et al.  Consistent Estimation of Mixed Memberships with Successive Projections , 2017, COMPLEX NETWORKS.