Provable sparse tensor decomposition

We propose a novel sparse tensor decomposition method, namely the tensor truncated power method, that incorporates variable selection in the estimation of decomposition components. The sparsity is achieved via an efficient truncation step embedded in the tensor power iteration. Our method applies to a broad family of high dimensional latent variable models, including high dimensional Gaussian mixtures and mixtures of sparse regressions. A thorough theoretical investigation is further conducted. In particular, we show that the final decomposition estimator is guaranteed to achieve a local statistical rate, and we further strengthen it to the global statistical rate by introducing a proper initialization procedure. In high dimensional regimes, the statistical rate obtained significantly improves those shown in the existing non‐sparse decomposition methods. The empirical advantages of tensor truncated power are confirmed in extensive simulation results and two real applications of click‐through rate prediction and high dimensional gene clustering.

[1]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[2]  J. Kruskal More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling , 1976 .

[3]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[6]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[7]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[8]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[9]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[10]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[11]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[12]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixtures of distributions , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[13]  Kert Viele,et al.  Modeling with Mixtures of Linear Regressions , 2002, Stat. Comput..

[14]  R. Bro,et al.  A new efficient method for determining the number of components in PARAFAC models , 2003 .

[15]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[16]  Sanjoy Dasgupta,et al.  A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians , 2007, J. Mach. Learn. Res..

[17]  E. Candès,et al.  Sparsity and incoherence in compressive sampling , 2006, math/0611957.

[18]  Wei Pan,et al.  Penalized Model-Based Clustering with Application to Variable Selection , 2007, J. Mach. Learn. Res..

[19]  Lars Kai Hansen,et al.  Algorithms for Sparse Nonnegative Tucker Decompositions , 2008, Neural Computation.

[20]  Jieping Ye,et al.  Tensor Completion for Estimating Missing Values in Visual Data , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[22]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[23]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[24]  Shengcai Liao,et al.  Flickr group recommendation based on tensor decomposition , 2010, SIGIR.

[25]  Adam Tauman Kalai,et al.  Efficiently learning mixtures of two Gaussians , 2010, STOC '10.

[26]  Nuria Oliver,et al.  Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering , 2010, RecSys '10.

[27]  Lars Schmidt-Thieme,et al.  Pairwise interaction tensor factorization for personalized tag recommendation , 2010, WSDM '10.

[28]  Junhui Wang Consistent selection of the number of clusters via crossvalidation , 2010 .

[29]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[30]  Mikhail Belkin,et al.  Polynomial Learning of Distribution Families , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[31]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[32]  Jieping Ye,et al.  Sparse non-negative tensor factorization using columnwise coordinate descent , 2012, Pattern Recognit..

[33]  Wei Sun,et al.  Regularized k-means clustering of high-dimensional data and its asymptotic consistency , 2012 .

[34]  Giorgio Ottaviani,et al.  On Generic Identifiability of 3-Tensors of Small Rank , 2011, SIAM J. Matrix Anal. Appl..

[35]  Genevera I. Allen,et al.  Sparse Higher-Order Principal Components Analysis , 2012, AISTATS.

[36]  Tamara G. Kolda,et al.  On Tensors, Sparsity, and Nonnegative Factorizations , 2011, SIAM J. Matrix Anal. Appl..

[37]  Massimiliano Pontil,et al.  A New Convex Relaxation for Tensor Completion , 2013, NIPS.

[38]  Xiao-Tong Yuan,et al.  Truncated power method for sparse eigenvalue problems , 2011, J. Mach. Learn. Res..

[39]  Tamara G. Kolda,et al.  Newton-Based Optimization for Nonnegative Tensor Factorizations , 2013, ArXiv.

[40]  Johan A. K. Suykens,et al.  Learning with tensors: a framework based on convex optimization and spectral regularization , 2014, Machine Learning.

[41]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[42]  Wei Sun,et al.  Consistent selection of tuning parameters via variable selection stability , 2012, J. Mach. Learn. Res..

[43]  Hongtu Zhu,et al.  Tensor Regression with Applications in Neuroimaging Data Analysis , 2012, Journal of the American Statistical Association.

[44]  Percy Liang,et al.  Spectral Experts for Estimating Mixtures of Linear Regressions , 2013, ICML.

[45]  Anima Anandkumar,et al.  Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates , 2014, ArXiv.

[46]  Zhaoran Wang,et al.  High Dimensional Expectation-Maximization Algorithm: Statistical Optimization and Asymptotic Normality , 2014, 1412.8729.

[47]  A. Appendix Alternating Minimization for Mixed Linear Regression , 2014 .

[48]  Yi Yang,et al.  Decomposable Nonlocal Tensor Dictionary Learning for Multispectral Image Denoising , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  David B. Dunson,et al.  Scalable Bayesian Low-Rank Decomposition of Incomplete Multiway Tensors , 2014, ICML.

[50]  Constantine Caramanis,et al.  Alternating Minimization for Mixed Linear Regression , 2013, ICML.

[51]  Anima Anandkumar,et al.  A tensor approach to learning mixed membership community models , 2013, J. Mach. Learn. Res..

[52]  Hong Cheng,et al.  Generalized Higher-Order Orthogonal Iteration for Tensor Decomposition and Completion , 2014, NIPS.

[53]  Jiawei Han,et al.  Robust Tensor Decomposition with Gross Corruption , 2014, NIPS.

[54]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[55]  Zenglin Xu,et al.  Scalable Nonparametric Multiway Data Analysis , 2015, AISTATS.

[56]  Tamara G. Kolda,et al.  Newton-based optimization for Kullback–Leibler nonnegative tensor factorizations , 2013, Optim. Methods Softw..

[57]  Tamara G. Kolda,et al.  Numerical optimization for symmetric tensor decomposition , 2014, Mathematical Programming.

[58]  David Steurer,et al.  Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method , 2014, STOC.

[59]  David B. Dunson,et al.  Bayesian Conditional Tensor Factorizations for High-Dimensional Classification , 2013, Journal of the American Statistical Association.

[60]  Ming Yuan,et al.  On Tensor Completion via Nuclear Norm Minimization , 2014, Foundations of Computational Mathematics.