论文信息 - Learning Overcomplete Latent Variable Models through Tensor Methods

Learning Overcomplete Latent Variable Models through Tensor Methods

We provide guarantees for learning latent variable models emphasizing on the overcomplete regime, where the dimensionality of the latent space exceeds the observed dimensionality. In particular, we consider multiview mixtures, ICA, and sparse coding models. Our main tool is a new algorithm for tensor decomposition that works in the overcomplete regime. In the semi-supervised setting, we exploit label information to get a rough estimate of the model parameters, and then refine it using the tensor method on unlabeled samples. We establish learning guarantees when the number of components scales as k = o(d), where d is the observed dimension, and p is the order of the observed moment employed in the tensor method (usually p = 3, 4). In the unsupervised setting, a simple initialization algorithm based on SVD of the tensor slices is proposed, and the guarantees are provided under the stricter condition that k ≤ βd (where constant β can be larger than 1). For the learning applications, we provide tight sample complexity bounds through novel covering arguments.

[1] E. Wigner. Characteristic Vectors of Bordered Matrices with Infinite Dimensions I , 1955 .

[2] J. Chang,et al. Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[3] Pierre Comon,et al. Independent component analysis, A new concept? , 1994, Signal Process..

[4] Pierre Comon,et al. Independent component analysis, a survey of some algebraic methods , 1996, 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96.

[5] Terrence J. Sejnowski,et al. Learning Overcomplete Representations , 2000, Neural Computation.

[6] Erkki Oja,et al. Independent component analysis: algorithms and applications , 2000, Neural Networks.

[7] Gene H. Golub,et al. Rank-One Approximation to High Order Tensors , 2001, SIAM J. Matrix Anal. Appl..

[8] Demetri Terzopoulos,et al. Multilinear subspace analysis of image ensembles , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[9] M. Rudelson,et al. Lp-moments of random vectors via majorizing measures , 2005, math/0507023.

[10] R. Latala. Estimates of moments and tails of Gaussian chaoses , 2005, math/0505313.

[11] R. Latala. Estimates of moments and tails of Gaussian chaoses , 2005, math/0505313.

[12] Sanjoy Dasgupta,et al. A Concentration Theorem for Projections , 2006, UAI.

[13] Emmanuel J. Candès,et al. Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[14] Lieven De Lathauwer,et al. Fourth-Order Cumulant-Based Blind Identification of Underdetermined Mixtures , 2007, IEEE Transactions on Signal Processing.

[15] M. Rudelson,et al. The smallest singular value of a random rectangular matrix , 2008, 0802.3956.

[16] Pierre Comon,et al. Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[17] Trac D. Tran,et al. Tensor sparsification via a bound on the spectral norm of random tensors , 2010, ArXiv.

[18] Quoc V. Le,et al. ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[19] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[20] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..