Low-Rank Spectral Learning

Spectral learning methods have recently been proposed as alternatives to slow, non-convex optimization algorithms like EM for a variety of probabilistic models in which hidden information must be inferred by the learner. These methods are typically controlled by a rank hyperparameter that sets the complexity of the model; when the model rank matches the true rank of the process generating the data, the resulting predictions are provably consistent and admit finite sample convergence bounds. However, in practice we usually do not know the true rank, and, in any event, from a computational and statistical standpoint it is likely to be prohibitively large. It is therefore of great practical interest to understand the behavior of low-rank spectral learning, where the model rank is less than the true rank. Counterintuitively, we show that even when the singular values omitted by lowering the rank are arbitrarily small, the resulting prediction errors can in fact be arbitrarily large. We identify two distinct possible causes for this bad behavior, and illustrate them with simple examples. We then show that these two causes are essentially complete: assuming that they do not occur, we can prove that the prediction error is bounded in terms of the magnitudes of the omitted singular values. We argue that the assumptions necessary for this result are relatively realistic, making low-rank spectral learning a viable option for many applications. Appearing in Proceedings of the 17 International Conference on Artificial Intelligence and Statistics (AISTATS) 2014, Reykjavik, Iceland. JMLR: W&CP volume 33. Copyright 2014 by the authors.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[3]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[4]  Sebastiaan A. Terwijn,et al.  On the Learnability of Hidden Markov Models , 2002, ICGI.

[5]  Michael R. James,et al.  Learning predictive state representations in dynamical systems without reset , 2005, ICML.

[6]  Byron Boots,et al.  Reduced-Rank Hidden Markov Models , 2009, AISTATS.

[7]  Le Song,et al.  Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.

[8]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[9]  Le Song,et al.  A Spectral Algorithm for Latent Tree Graphical Models , 2011, ICML.

[10]  George Cybenko,et al.  Learning Hidden Markov Models Using Nonnegative Matrix Factorization , 2008, IEEE Transactions on Information Theory.

[11]  Ariadna Quattoni,et al.  A Spectral Learning Algorithm for Finite State Transducers , 2011, ECML/PKDD.

[12]  Byron Boots,et al.  An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems , 2011, AAAI.

[13]  Karl Stratos,et al.  Spectral Learning of Latent-Variable PCFGs , 2012, ACL.

[14]  Raj Rao Nadakuditi,et al.  The singular values and vectors of low rank perturbations of large rectangular random matrices , 2011, J. Multivar. Anal..

[15]  Ariadna Quattoni,et al.  Local Loss Optimization in Operator Models: A New Insight into Spectral Learning , 2012, ICML.

[16]  Dean P. Foster,et al.  Spectral dimensionality reduction for HMMs , 2012, ArXiv.

[17]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[18]  Aryeh Kontorovich,et al.  On learning parametric-output HMMs , 2013, ICML.

[19]  Ariadna Quattoni,et al.  Spectral learning of weighted automata , 2014, Machine Learning.

[20]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[21]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .