Provable Learning of Overcomplete Latent Variable Models: Semi-supervised and Unsupervised Settings

We provide guarantees for learning latent variable models emphasizing on the overcomplete regime, where the dimensionality of the latent space can exceed the observed dimensionality. In particular, we consider spherical Gaussian mixtures and multiview mixtures models. Our algorithm is based on method of moments, and employs a tensor decomposition method for learning. In the semi-supervised setting, we exploit the label or prior information to get a rough estimate of the model parameters, and then refine it using the tensor method on unlabeled samples. We establish learning guarantees when the number of components scales as k = o(d), where d is the observed dimension, and p is the order of the observed moment employed in the tensor method. In the unsupervised setting, a simple initialization algorithm based on SVD of the tensor slices is proposed, and the guarantees are provided under the stricter condition that k ≤ Cd (where constant C can be larger than 1). We also provide tight sample complexity bounds through novel covering arguments.

[1]  Anima Anandkumar,et al.  Two SVDs Suffice: Spectral decompositions for probabilistic topic modeling and latent Dirichlet allocation , 2012, NIPS 2012.

[2]  Anima Anandkumar,et al.  When are overcomplete topic models identifiable? uniqueness of tensor tucker decompositions with structured sparsity , 2013, J. Mach. Learn. Res..

[3]  Anima Anandkumar,et al.  Fast Detection of Overlapping Communities via Online Tensor Methods on GPUs , 2013, ArXiv.

[4]  Sanjeev Arora,et al.  New Algorithms for Learning Incoherent and Overcomplete Dictionaries , 2013, COLT.

[5]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[6]  P. Comon Independent Component Analysis , 1992 .

[7]  Anima Anandkumar,et al.  Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates , 2014, ArXiv.

[8]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[9]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[10]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[11]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[12]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[13]  Anima Anandkumar,et al.  Online tensor methods for learning latent variable models , 2013, J. Mach. Learn. Res..

[14]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[15]  M. Rudelson,et al.  The smallest singular value of a random rectangular matrix , 2008, 0802.3956.

[16]  Pierre Comon Independent component analysis - a new concept? signal processing , 1994 .

[17]  Mikhail Belkin,et al.  The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures , 2013, COLT.

[18]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[19]  Pascal Vincent,et al.  Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives , 2012, ArXiv.

[20]  Gene H. Golub,et al.  Rank-One Approximation to High Order Tensors , 2001, SIAM J. Matrix Anal. Appl..

[21]  Rafa l Lata la Estimates of moments and tails of Gaussian chaoses , 2005 .

[22]  Anima Anandkumar,et al.  A Tensor Spectral Approach to Learning Mixed Membership Community Models , 2013, COLT.

[23]  Demetri Terzopoulos,et al.  Multilinear subspace analysis of image ensembles , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[24]  Pierre Comon,et al.  Independent component analysis, a survey of some algebraic methods , 1996, 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96.

[25]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[26]  Ryan P. Adams,et al.  Contrastive Learning Using Spectral Methods , 2013, NIPS.

[27]  Lieven De Lathauwer,et al.  Fourth-Order Cumulant-Based Blind Identification of Underdetermined Mixtures , 2007, IEEE Transactions on Signal Processing.

[28]  D. L. Donoho,et al.  Compressed sensing , 2006, IEEE Trans. Inf. Theory.

[29]  David Steurer,et al.  Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method , 2014, STOC.

[30]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[31]  Joachim M. Buhmann,et al.  Correlated random features for fast semi-supervised learning , 2013, NIPS.

[32]  Aditya Bhaskara,et al.  Smoothed analysis of tensor decompositions , 2013, STOC.

[33]  Le Song,et al.  Nonparametric Estimation of Multi-View Latent Variable Models , 2013, ICML.

[34]  Quoc V. Le,et al.  ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[35]  Trac D. Tran,et al.  Tensor sparsification via a bound on the spectral norm of random tensors , 2010, ArXiv.

[36]  Prateek Jain,et al.  Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization , 2013, SIAM J. Optim..