Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints

Author(s): Anandkumar, Animashree; Hsu, Daniel; Javanmard, Adel; Kakade, Sham M | Abstract: Unsupervised estimation of latent variable models is a fundamental problem central to numerous applications of machine learning and statistics. This work presents a principled approach for estimating broad classes of such models, including probabilistic topic models and latent linear Bayesian networks, using only second-order observed moments. The sufficient conditions for identifiability of these models are primarily based on weak expansion constraints on the topic-word matrix, for topic models, and on the directed acyclic graph, for Bayesian networks. Because no assumptions are made on the distribution among the latent variables, the approach can handle arbitrary correlations among the topics or latent factors. In addition, a tractable learning method via $\ell_1$ optimization is proposed and studied in numerical experiments.

[1]  T. Haavelmo The Statistical Implications of a System of Simultaneous Equations , 1943 .

[2]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[3]  A. Zellner An Introduction to Bayesian Inference in Econometrics , 1971 .

[4]  B. Wheaton The sociogenesis of psychological disorder: reexamining the causal issues with longitudinal data. , 1978, American sociological review.

[5]  William R. Darden,et al.  Causal Models in Marketing , 1980 .

[6]  B. Wheaton The sociogenesis of psychological disorder: an attributional theory. , 1980, Journal of health and social behavior.

[7]  M. Kohn,et al.  Job Conditions and Personality: A Longitudinal Assessment of Their Reciprocal Effects , 1982, American Journal of Sociology.

[8]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[9]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[10]  R. P. McDonald,et al.  Structural Equations with Latent Variables , 1989 .

[11]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[12]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[13]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .

[14]  Avi Pfeffer,et al.  Object-Oriented Bayesian Networks , 1997, UAI.

[15]  J. Pearl Graphs, Causality, and Structural Equation Models , 1998 .

[16]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[17]  Tandy J. Warnow,et al.  A Few Logs Suffice to Build (almost) All Trees: Part II , 1999, Theor. Comput. Sci..

[18]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[19]  Barak A. Pearlmutter,et al.  Blind Source Separation by Sparse Decomposition in a Signal Dictionary , 2001, Neural Computation.

[20]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[21]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2003, J. Mach. Learn. Res..

[22]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[23]  T. Awokuse,et al.  Vector Autoregressions, Policy Analysis, and Directed Acyclic Graphs: An Application to the U.S. Economy , 2003 .

[24]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[25]  Michael I. Jordan,et al.  Beyond Independent Components: Trees and Clusters , 2003, J. Mach. Learn. Res..

[26]  P. Spirtes Graphical models, causal inference, and econometric models , 2005 .

[27]  Thomas S. Richardson,et al.  Towards Characterizing Markov Equivalence Classes for Directed Acyclic Graphs with Latent Variables , 2005, UAI.

[28]  Elchanan Mossel,et al.  Optimal phylogenetic reconstruction , 2005, STOC '06.

[29]  Richard Scheines,et al.  Learning the Structure of Linear Latent Variable Models , 2006, J. Mach. Learn. Res..

[30]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[31]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[32]  Fabian J. Theis,et al.  Towards a general independent subspace analysis , 2006, NIPS.

[33]  Pieter Abbeel,et al.  Learning Factor Graphs in Polynomial Time and Sample Complexity , 2006, J. Mach. Learn. Res..

[34]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[35]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[36]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[37]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[38]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[39]  Ben Taskar,et al.  Graphical Models in a Nutshell , 2007 .

[40]  Piotr Indyk,et al.  Combining geometry and combinatorics: A unified approach to sparse signal recovery , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[41]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[42]  Tim Austin On exchangeable random variables and the statistics of large graphs and hypergraphs , 2008, 0801.1698.

[43]  Elchanan Mossel,et al.  Reconstruction of Markov Random Fields from Samples: Some Observations and Algorithms , 2007, SIAM J. Comput..

[44]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[45]  Frédo Durand,et al.  Understanding and evaluating blind deconvolution algorithms , 2009, CVPR.

[46]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[47]  Lee-Ad Gottlieb,et al.  Matrix Sparsification and the Sparse Null Space Problem , 2010, APPROX-RANDOM.

[48]  Antonio Torralba,et al.  Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[50]  A. Willsky,et al.  Latent variable graphical model selection via convex optimization , 2010 .

[51]  Peter Bühlmann,et al.  Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs , 2011, J. Mach. Learn. Res..

[52]  Le Song,et al.  Spectral Methods for Learning Multivariate Latent Tree Structure , 2011, NIPS.

[53]  Ali Jalali,et al.  On Learning Discrete Graphical Models using Greedy Methods , 2011, NIPS.

[54]  Vincent Y. F. Tan,et al.  Learning Latent Tree Graphical Models , 2010, J. Mach. Learn. Res..

[55]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[56]  Sham M. Kakade,et al.  Robust Matrix Decomposition With Sparse Corruptions , 2011, IEEE Transactions on Information Theory.

[57]  Bernhard Schölkopf,et al.  Identifiability of Causal Graphs using Functional Models , 2011, UAI.

[58]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[59]  Anima Anandkumar,et al.  Learning Loopy Graphical Models with Latent Variables: Efficient Methods and Guarantees , 2012, The Annals of Statistics.

[60]  Huan Wang,et al.  Exact Recovery of Sparsely-Used Dictionaries , 2012, COLT.

[61]  Sanjeev Arora,et al.  Learning Topic Models -- Going beyond SVD , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[62]  Sagi Snir,et al.  Recovering the Tree-Like Trend of Evolution Despite Extensive Lateral Genetic Transfer: A Probabilistic Analysis , 2012, RECOMB.

[63]  Vincent Y. F. Tan,et al.  High-dimensional structure estimation in Ising models: Local separation criterion , 2011, 1107.1736.

[64]  Pablo A. Parrilo,et al.  Diagonal and Low-Rank Matrix Decompositions, Correlation Matrices, and Ellipsoid Fitting , 2012, SIAM J. Matrix Anal. Appl..

[65]  Anima Anandkumar,et al.  Two SVDs Suffice: Spectral decompositions for probabilistic topic modeling and latent Dirichlet allocation , 2012, NIPS 2012.

[66]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[67]  Sanjeev Arora,et al.  A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.

[68]  Sagi Snir,et al.  Recovering the Tree-Like Trend of Evolution Despite Extensive Lateral Genetic Transfer: A Probabilistic Analysis , 2012, RECOMB.

[69]  J. Peters,et al.  Identifiability of Gaussian structural equation models with equal error variances , 2012, 1205.2536.

[70]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[71]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[72]  Anima Anandkumar,et al.  Tensor Decompositions for Learning Latent Variable Models (A Survey for ALT) , 2015, ALT.

[73]  Lee-Ad Gottlieb,et al.  Matrix Sparsification and the Sparse Null Space Problem , 2010, Algorithmica.