A tensor approach to learning mixed membership community models

Community detection is the task of detecting hidden communities from observed interactions. Guaranteed community detection has so far been mostly limited to models with non-overlapping communities such as the stochastic block model. In this paper, we remove this restriction, and provide guaranteed community detection for a family of probabilistic network models with overlapping communities, termed as the mixed membership Dirichlet model, first introduced by Airoldi et al. This model allows for nodes to have fractional memberships in multiple communities and assumes that the community memberships are drawn from a Dirichlet distribution. Moreover, it contains the stochastic block model as a special case. We propose a unified approach to learning these models via a tensor spectral decomposition method. Our estimator is based on low-order moment tensor of the observed network, consisting of 3-star counts. Our learning method is fast and is based on simple linear algebraic operations, e.g. singular value decomposition and tensor power iterations. We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method. As an important special case, our results match the best known scaling requirements for the (homogeneous) stochastic block model.

[1]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[2]  Morroe Berger,et al.  Freedom and control in modern society , 1954 .

[3]  P. Lazarsfeld,et al.  Friendship as Social process: a substantive and methodological analysis , 1964 .

[4]  S. Boorman,et al.  Social structure from multiple networks: I , 1976 .

[5]  S. Boorman,et al.  Social Structure from Multiple Networks. I. Blockmodels of Roles and Positions , 1976, American Journal of Sociology.

[6]  S. Boorman,et al.  Social Structure from Multiple Networks. II. Role Structures , 1976, American Journal of Sociology.

[7]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[8]  K. Ferentios On Tcebycheff's type inequalities , 1982 .

[9]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[10]  M. M. Meyer,et al.  Statistical Analysis of Multiple Sociometric Relations. , 1985 .

[11]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[12]  V. N. Bogaevski,et al.  Matrix Perturbation Theory , 1991 .

[13]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[14]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[15]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[16]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[17]  Alan M. Frieze,et al.  Quick Approximation to Matrices and Applications , 1999, Comb..

[18]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 1999, Random Struct. Algorithms.

[19]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[20]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[21]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[22]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[25]  Béla Bollobás,et al.  The phase transition in inhomogeneous random graphs , 2007, Random Struct. Algorithms.

[26]  M. Jackson,et al.  An Economic Model of Friendship: Homophily, Minorities and Segregation , 2007 .

[27]  Alan M. Frieze,et al.  A new approach to the planted clique problem , 2008, FSTTCS.

[28]  A. Rinaldo,et al.  On the geometry of discrete exponential families with application to exponential random graph models , 2008, 0901.0026.

[29]  László Lovász,et al.  Very large graphs , 2009, 0902.0132.

[30]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[31]  Santosh S. Vempala,et al.  Random Tensors and Planted Cliques , 2009, APPROX-RANDOM.

[32]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[33]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[34]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[35]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[36]  Qinna Wang,et al.  Uncovering Overlapping Community Structure , 2010, CompleNet.

[37]  E. Xing,et al.  A state-space mixed membership blockmodel for dynamic network tomography , 2008, 0901.0135.

[38]  Yudong Chen,et al.  Clustering Partially Observed Graphs via Convex Optimization , 2011, ICML.

[39]  P. Bickel,et al.  The method of moments and degree distributions for network models , 2011, 1202.5101.

[40]  Alain Celisse,et al.  Consistency of maximum-likelihood and variational estimators in the Stochastic Block Model , 2011, 1105.3288.

[41]  Tamara G. Kolda,et al.  Shifted Power Method for Computing Tensor Eigenpairs , 2010, SIAM J. Matrix Anal. Appl..

[42]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[43]  Elchanan Mossel,et al.  Stochastic Block Models and Reconstruction , 2012 .

[44]  Mark Braverman,et al.  I Like Her more than You: Self-determined Communities , 2012, ArXiv.

[45]  Sanjeev Arora,et al.  Finding overlapping communities in social networks: toward a rigorous approach , 2011, EC '12.

[46]  Fan Chung Graham,et al.  Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model , 2012, COLT.

[47]  Michael J. Freedman,et al.  Scalable Inference of Overlapping Communities , 2012, NIPS.

[48]  Santosh S. Vempala,et al.  Statistical Algorithms and a Lower Bound for Planted Clique , 2012, Electron. Colloquium Comput. Complex..

[49]  Anima Anandkumar,et al.  Two SVDs Suffice: Spectral decompositions for probabilistic topic modeling and latent Dirichlet allocation , 2012, NIPS 2012.

[50]  Sujay Sanghavi,et al.  Clustering Sparse Graphs , 2012, NIPS.

[51]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[52]  P. Diaconis,et al.  Estimating and understanding exponential random graph models , 2011, 1102.2650.

[53]  Nir Ailon,et al.  Breaking the Small Cluster Barrier of Graph Clustering , 2013, ICML.

[54]  Anima Anandkumar,et al.  Fast Detection of Overlapping Communities via Online Tensor Methods on GPUs , 2013, ArXiv.

[55]  L. Reyzin,et al.  Statistical algorithms and a lower bound for detecting planted cliques , 2012, STOC '13.

[56]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[57]  Ittai Abraham,et al.  Low-Distortion Inference of Latent Similarities from a Multiplex Social Network , 2012, SIAM J. Comput..

[58]  Matus Telgarsky Dirichlet draws are sparse with high probability , 2013, ArXiv.

[59]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.

[60]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[61]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[62]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[63]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[64]  Derényi,et al.  of complex networks in nature and Society ' , 2022 .