Spectral Learning Algorithms for Natural Language Processing

Recent work in machine learning and NLP has developed spectral algorithms for many learning tasks involving latent variables. Spectral algorithms rely on singular value decomposition as a basic operation, usually followed by some simple estimation method based on the method of moments. From a theoretical point of view, these methods are appealing in that they offer consistent estimators (and PAC-style guarantees of sample complexity) for several important latent-variable models. This is in contrast to the EM algorithm, which is an extremely successful approach, but which only has guarantees of reaching a local maximum of the likelihood function. From a practical point of view, the methods (unlike EM) have no need for careful initialization, and have recently been shown to be highly efficient (as one example, in work under submission by the authors on learning of latent-variable PCFGs, a spectral algorithm performs at identical accuracy to EM, but is around 20 times faster).

[1]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixtures of distributions , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[2]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[3]  Amaury Habrard,et al.  A Spectral Approach for Probabilistic Grammatical Inference on Trees , 2010, ALT.

[4]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[5]  Joshua Goodman,et al.  Parsing Algorithms and Metrics , 1996, ACL.

[6]  Michael Collins,et al.  Spectral Dependency Parsing with Latent Variables , 2012, EMNLP-CoNLL.

[7]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[8]  Karl Stratos,et al.  Experiments with Spectral Learning of Latent-Variable PCFGs , 2013, HLT-NAACL.

[9]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[10]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[11]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Karl Stratos,et al.  Spectral Learning of Latent-Variable PCFGs , 2012, ACL.

[14]  Mehryar Mohri,et al.  Spectral Learning of General Weighted Automata via Constrained Matrix Completion , 2012, NIPS.

[15]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[16]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[17]  Ariadna Quattoni,et al.  Spectral Learning for Non-Deterministic Dependency Parsing , 2012, EACL.

[18]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[19]  Fernando Pereira,et al.  Aggregate and mixed-order Markov models for statistical language processing , 1997, EMNLP.

[20]  Le Song,et al.  A Spectral Algorithm for Latent Tree Graphical Models , 2011, ICML.

[21]  Ariadna Quattoni,et al.  A Spectral Learning Algorithm for Finite State Transducers , 2011, ECML/PKDD.

[22]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[23]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[24]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[25]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..