Uniqueness of Tensor Decompositions with Applications to Polynomial Identifiability

We give a robust version of the celebrated result of Kruskal on the uniqueness of tensor decompositions: we prove that given a tensor whose decomposition satisfies a robust form of Kruskal's rank condition, it is possible to approximately recover the decomposition if the tensor is known up to a sufficiently small (inverse polynomial) error. Kruskal's theorem has found many applications in proving the identifiability of parameters for various latent variable models and mixture models such as Hidden Markov models, topic models etc. Our robust version immediately implies identifiability using only polynomially many samples in many of these settings. This polynomial identifiability is an essential first step towards efficient learning algorithms for these models. Recently, algorithms based on tensor decompositions have been used to estimate the parameters of various hidden variable models efficiently in special cases as long as they satisfy certain "non-degeneracy" properties. Our methods give a way to go beyond this non-degeneracy barrier, and establish polynomial identifiability of the parameters under much milder conditions. Given the importance of Kruskal's theorem in the tensor literature, we expect that this robust version will have several applications beyond the settings we explore in this work.

[1]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[2]  F. L. Hitchcock The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .

[3]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[4]  R. Cattell “Parallel proportional profiles” and other principles for determining the choice of factors by rotation , 1944 .

[5]  H. Teicher Identifiability of Mixtures of Product Measures , 1967 .

[6]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[7]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[8]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[11]  G. M. Tallis,et al.  Identifiability of mixtures , 1982, Journal of the Australian Mathematical Society. Series A. Pure Mathematics and Statistics.

[12]  Johan Håstad Tensor Rank is NP-Complete , 1990, J. Algorithms.

[13]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[14]  S. Leurgans,et al.  A Decomposition for Three-Way Arrays , 1993, SIAM J. Matrix Anal. Appl..

[15]  Marie-Françoise Roy,et al.  On the combinatorial and algebraic complexity of quantifier elimination , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[16]  Leonid Khachiyan,et al.  On the Complexity of Approximating Extremal Determinants in Matrices , 1995, J. Complex..

[17]  Steve Young,et al.  The HTK book , 1995 .

[18]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[19]  Steve Young,et al.  The development of the 1996 HTK broadcast news transcription system , 1996 .

[20]  Joseph T. Chang,et al.  Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[21]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[22]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[23]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[24]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[25]  H. Kiers Towards a standardized notation and terminology in multiway analysis , 2000 .

[26]  N. Sidiropoulos,et al.  On the uniqueness of multilinear decomposition of N‐way arrays , 2000 .

[27]  P. Paatero Construction and analysis of degenerate PARAFAC models , 2000 .

[28]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[29]  Gene H. Golub,et al.  Rank-One Approximation to High Order Tensors , 2001, SIAM J. Matrix Anal. Appl..

[30]  J. Berge,et al.  On uniqueness in candecomp/parafac , 2002 .

[31]  Phillip A. Regalia,et al.  On the Best Rank-1 Approximation of Higher-Order Supersymmetric Tensors , 2001, SIAM J. Matrix Anal. Appl..

[32]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..

[33]  Nikos D. Sidiropoulos,et al.  Kruskal's permutation lemma and the identification of CANDECOMP/PARAFAC and bilinear models with constant modulus constraints , 2004, IEEE Transactions on Signal Processing.

[34]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[35]  Santosh S. Vempala,et al.  Tensor decomposition and approximation schemes for constraint satisfaction problems , 2005, STOC '05.

[36]  Elchanan Mossel,et al.  Learning nonsingular phylogenies and hidden Markov models , 2005, STOC '05.

[37]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[38]  J. Feldman,et al.  Learning mixtures of product distributions over discrete domains , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[39]  Jon Feldman,et al.  PAC Learning Axis-Aligned Mixtures of Gaussians with No Separation Assumption , 2006, COLT.

[40]  Lieven De Lathauwer,et al.  Fourth-Order Cumulant-Based Blind Identification of Underdetermined Mixtures , 2007, IEEE Transactions on Signal Processing.

[41]  A. Stegeman,et al.  On Kruskal's uniqueness condition for the Candecomp/Parafac decomposition , 2007 .

[42]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[43]  Michael W. Mahoney,et al.  A randomized algorithm for a tensor-based generalization of the singular value decomposition , 2007 .

[44]  Sanjoy Dasgupta,et al.  A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians , 2007, J. Mach. Learn. Res..

[45]  Mark J. F. Gales,et al.  The Application of Hidden Markov Models in Speech Recognition , 2007, Found. Trends Signal Process..

[46]  Vin de Silva,et al.  Tensor rank and the ill-posedness of the best low-rank approximation problem , 2006, math/0607647.

[47]  Santosh S. Vempala,et al.  Random Tensors and Planted Cliques , 2009, APPROX-RANDOM.

[48]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[49]  P. Comon,et al.  Tensor decompositions, alternating least squares and other tales , 2009 .

[50]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[51]  Bülent Yener,et al.  Unsupervised Multiway Data Analysis: A Literature Survey , 2009, IEEE Transactions on Knowledge and Data Engineering.

[52]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[53]  Pierre Comon,et al.  Subtracting a best rank-1 approximation may increase tensor rank , 2009, 2009 17th European Signal Processing Conference.

[54]  J. Rhodes A concise proof of Kruskal’s theorem on tensor decomposition , 2009, 0901.1796.

[55]  C. Matias,et al.  Identifiability of parameters in latent structure models with many observed variables , 2008, 0809.5032.

[56]  Amit Kumar,et al.  Clustering with Spectral Norm and the k-Means Algorithm , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[57]  Andrew Chi-chih Yao,et al.  Innovations in Computer Science - ICS 2010, Tsinghua University, Beijing, China, January 5-7, 2010. Proceedings , 2010, ICS.

[58]  Adam Tauman Kalai,et al.  Efficiently learning mixtures of two Gaussians , 2010, STOC '10.

[59]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[60]  Mikhail Belkin,et al.  Polynomial Learning of Distribution Families , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[61]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[62]  Pascal Koiran,et al.  On the Certification of the Restricted Isometry Property , 2011, ArXiv.

[63]  J. Landsberg Tensors: Geometry and Applications , 2011 .

[64]  David P. Woodruff,et al.  The Complexity of Linear Dependence Problems in Vector Spaces , 2011, ICS.

[65]  Seth Sullivant,et al.  Identifiability of Two-Tree Mixtures for Group-Based Models , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[66]  Tamara G. Kolda,et al.  Shifted Power Method for Computing Tensor Eigenpairs , 2010, SIAM J. Matrix Anal. Appl..

[67]  Sanjeev Arora,et al.  Learning Topic Models -- Going beyond SVD , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[68]  Sham M. Kakade,et al.  Learning Gaussian Mixture Models: Moment Methods and Spectral Decompositions , 2012, arXiv.org.

[69]  Sham M. Kakade,et al.  Identifiability and Unmixing of Latent Parse Trees , 2012, NIPS.

[70]  Giorgio Ottaviani,et al.  On Generic Identifiability of 3-Tensors of Small Rank , 2011, SIAM J. Matrix Anal. Appl..

[71]  Nick Gravin,et al.  The Inverse Moment Problem for Convex Polytopes , 2011, Discret. Comput. Geom..

[72]  Seth Sullivant,et al.  Identifiability of Large Phylogenetic Mixture Models , 2010, Bulletin of mathematical biology.

[73]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[74]  Anima Anandkumar,et al.  Learning Mixtures of Tree Graphical Models , 2012, NIPS.

[75]  Pranjal Awasthi,et al.  Improved Spectral-Norm Bounds for Clustering , 2012, APPROX-RANDOM.

[76]  Anima Anandkumar,et al.  A Tensor Spectral Approach to Learning Mixed Membership Community Models , 2013, COLT.

[77]  Santosh S. Vempala,et al.  Fourier PCA , 2013, ArXiv.

[78]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[79]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.

[80]  Yuval Rabani,et al.  Learning mixtures of arbitrary distributions over large discrete domains , 2012, ITCS.

[81]  Anima Anandkumar,et al.  A tensor approach to learning mixed membership community models , 2013, J. Mach. Learn. Res..

[82]  Aditya Bhaskara,et al.  Smoothed analysis of tensor decompositions , 2013, STOC.

[83]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[84]  Mikhail Belkin,et al.  The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures , 2013, COLT.

[85]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[86]  Marc E. Pfetsch,et al.  The Computational Complexity of the Restricted Isometry Property, the Nullspace Property, and Related Concepts in Compressed Sensing , 2012, IEEE Transactions on Information Theory.