When are overcomplete topic models identifiable? uniqueness of tensor tucker decompositions with structured sparsity

Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime, where the number of latent topics can greatly exceed the size of the observed word vocabulary. While general overcomplete topic models are not identifiable, we establish generic identifiability under a constraint, referred to as topic persistence. Our sufficient conditions for identifiability involve a novel set of "higher order" expansion conditions on the topic-word matrix or the population structure of the model. This set of higher-order expansion conditions allow for overcomplete models, and require the existence of a perfect matching from latent topics to higher order observed words. We establish that random structured topic models are identifiable w.h.p. in the overcomplete regime. Our identifiability results allow for general (non-degenerate) distributions for modeling the topic proportions, and thus, we can handle arbitrarily correlated topics in our framework. Our identifiability results imply uniqueness of a class of tensor decompositions with structured sparsity which is contained in the class of Tucker decompositions, but is more general than the Candecomp/Parafac (CP) decomposition.

[1]  P. Hall On Representatives of Subsets , 1935 .

[2]  J. Kruskal More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling , 1976 .

[3]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[4]  Vasek Chvátal,et al.  The tail of the hypergeometric distribution , 1979, Discret. Math..

[5]  Joseph T. Chang,et al.  Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[6]  Bhaskar D. Rao,et al.  An affine scaling methodology for best basis selection , 1999, IEEE Trans. Signal Process..

[7]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[8]  N. Sidiropoulos,et al.  On the uniqueness of multilinear decomposition of N‐way arrays , 2000 .

[9]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[10]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Nikos D. Sidiropoulos,et al.  Kruskal's permutation lemma and the identification of CANDECOMP/PARAFAC and bilinear models with constant modulus constraints , 2004, IEEE Transactions on Signal Processing.

[13]  Elchanan Mossel,et al.  Learning nonsingular phylogenies and hidden Markov models , 2005, STOC '05.

[14]  L. Lathauwer,et al.  Sufficient conditions for uniqueness in Candecomp/Parafac and Indscal with random component matrices , 2006, Psychometrika.

[15]  Lieven De Lathauwer,et al.  A Link between the Canonical Decomposition in Multilinear Algebra and Simultaneous Matrix Diagonalization , 2006, SIAM J. Matrix Anal. Appl..

[16]  Lieven De Lathauwer,et al.  Fourth-Order Cumulant-Based Blind Identification of Underdetermined Mixtures , 2007, IEEE Transactions on Signal Processing.

[17]  Tim Austin On exchangeable random variables and the statistics of large graphs and hypergraphs , 2008, 0801.1698.

[18]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[19]  C. Matias,et al.  Identifiability of parameters in latent structure models with many observed variables , 2008, 0809.5032.

[20]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[21]  Alexandros G. Dimakis,et al.  Sparse Recovery of Nonnegative Signals With Minimal Expansion , 2011, IEEE Transactions on Signal Processing.

[22]  Quoc V. Le,et al.  ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[23]  J. Landsberg Tensors: Geometry and Applications , 2011 .

[24]  F. Sommer,et al.  Ramsey theory reveals the conditions when sparse coding on subsampled data is unique , 2011 .

[25]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[26]  B. Recht,et al.  Tensor completion and low-n-rank tensor recovery via convex optimization , 2011 .

[27]  Huan Wang,et al.  Exact Recovery of Sparsely-Used Dictionaries , 2012, COLT.

[28]  Sanjeev Arora,et al.  Learning Topic Models -- Going beyond SVD , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[29]  André Uschmajew,et al.  Local Convergence of the Alternating Least Squares Algorithm for Canonical Tensor Approximation , 2012, SIAM J. Matrix Anal. Appl..

[30]  XuanLong Nguyen,et al.  Posterior contraction of the population polytope in finite admixture models , 2012, ArXiv.

[31]  Giorgio Ottaviani,et al.  On Generic Identifiability of 3-Tensors of Small Rank , 2011, SIAM J. Matrix Anal. Appl..

[32]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[33]  Pascal Vincent,et al.  Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives , 2012, ArXiv.

[34]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Anima Anandkumar,et al.  A Tensor Spectral Approach to Learning Mixed Membership Community Models , 2013, COLT.

[36]  Sanjeev Arora,et al.  A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.

[37]  L. Chiantini,et al.  One example of general unidentifiable tensors , 2013, 1303.6914.

[38]  Adel Javanmard,et al.  Learning Linear Bayesian Networks with Latent Variables , 2012, ICML.

[39]  M. Skala Hypergeometric tail inequalities: ending the insanity , 2013, 1311.5939.

[40]  C. Bocci,et al.  Refined methods for the identifiability of tensors , 2013, 1303.6915.

[41]  Massimiliano Pontil,et al.  Sparse coding for multitask and transfer learning , 2012, ICML.

[42]  Dong Yu,et al.  Deep Learning for Signal and Information Processing , 2013 .

[43]  Alexander G. Gray,et al.  Sparsity-Based Generalization Bounds for Predictive Sparse Coding , 2013, ICML.

[44]  Piotr Indyk,et al.  On Model-Based RIP-1 Matrices , 2013, ICALP.

[45]  Yuval Rabani,et al.  Learning mixtures of arbitrary distributions over large discrete domains , 2012, ITCS.

[46]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[47]  Santosh S. Vempala,et al.  Fourier PCA and robust tensor decomposition , 2013, STOC.

[48]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[49]  Amelia Taylor,et al.  A Semialgebraic Description of the General Markov Model on Phylogenetic Trees , 2012, SIAM J. Discret. Math..

[50]  Aditya Bhaskara,et al.  Uniqueness of Tensor Decompositions with Applications to Polynomial Identifiability , 2013, COLT.

[51]  Friedrich T. Sommer,et al.  When Can Dictionary Learning Uniquely Recover Sparse Data From Subsamples? , 2011, IEEE Transactions on Information Theory.

[52]  HsuDaniel,et al.  When are overcomplete topic models identifiable? uniqueness of tensor tucker decompositions with structured sparsity , 2015 .