Learning Mixtures of Discrete Product Distributions using Spectral Decompositions

We study the problem of learning a distribution from samples, when the underlying distribution is a mixture of product distributions over discrete domains. This problem is motivated by several practical applications such as crowdsourcing, recommendation systems, and learning Boolean functions. The existing solutions either heavily rely on the fact that the number of mixtures is finite or have sample/time complexity that is exponential in the number of mixtures. In this paper, we introduce a polynomial time/sample complexity method for learning a mixture of r discrete product distributions overf1; 2;:::;‘g n , for general ‘ and r. We show that our approach is consistent and further provide finite sample guarantees. We use recently developed techniques from tensor decompositions for moment matching. A crucial step in these approaches is to construct certain tensors with low-rank spectral decompositions. These tensors are typically estimated from the sample moments. The main challenge in learning mixtures of discrete product distributions is that the corresponding low-rank tensors cannot be obtained directly from the sample moments. Instead, we need to estimate a low-rank matrix using only off-diagonal entries, and estimate a tensor using a few linear measurements. We give an alternating minimization based method to estimate the low-rank matrix, and formulate the tensor estimation problem as a least-squares problem.

[1]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[2]  Joydeep Ghosh,et al.  Noisy Matrix Completion Using Alternating Minimization , 2013, ECML/PKDD.

[3]  Devavrat Shah,et al.  Budget-optimal crowdsourcing using low-rank matrix approximations , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[4]  Yishay Mansour,et al.  Estimating a mixture of two product distributions , 1999, COLT '99.

[5]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[6]  Devavrat Shah,et al.  Efficient crowdsourcing for multi-class labeling , 2013, SIGMETRICS '13.

[7]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[8]  Eran Halperin,et al.  An Efficient and Accurate Graph-Based Approach to Detect Population Substructure , 2007, RECOMB.

[9]  Satish Rao,et al.  Learning Mixtures of Product Distributions Using Correlations and Independence , 2008, COLT.

[10]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[11]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[12]  Sanjeev Arora,et al.  Learning Topic Models -- Going beyond SVD , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[13]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[14]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[15]  Devavrat Shah,et al.  Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2011, Oper. Res..

[16]  Laurent Massoulié,et al.  Distributed user profiling via spectral methods , 2010, SIGMETRICS '10.

[17]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[18]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[19]  Navin Goyal,et al.  Efficient Learning of Simplices , 2012, COLT.

[20]  Pablo A. Parrilo,et al.  Diagonal and Low-Rank Matrix Decompositions, Correlation Matrices, and Ellipsoid Fitting , 2012, SIAM J. Matrix Anal. Appl..

[21]  R. Preston McAfee,et al.  Who moderates the moderators?: crowdsourcing abuse detection in user-generated content , 2011, EC '11.

[22]  Pietro Perona,et al.  Inferring Ground Truth from Subjective Labelling of Venus Images , 1994, NIPS.

[23]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..

[24]  Sanjeev Arora,et al.  Provable ICA with Unknown Gaussian Noise, and Implications for Gaussian Mixtures and Autoencoders , 2012, Algorithmica.

[25]  Yuval Rabani,et al.  Learning mixtures of arbitrary distributions over large discrete domains , 2012, ITCS.

[26]  Eran Halperin,et al.  A rigorous analysis of population stratification with limited data , 2007, SODA '07.

[27]  J. Feldman,et al.  Learning mixtures of product distributions over discrete domains , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[28]  S. Hui,et al.  Evaluation of diagnostic tests without gold standards , 1998, Statistical methods in medical research.

[29]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[30]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[31]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[32]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.