Spectral Dependency Parsing with Latent Variables

Recently there has been substantial interest in using spectral methods to learn generative sequence models like HMMs. Spectral methods are attractive as they provide globally consistent estimates of the model parameters and are very fast and scalable, unlike EM methods, which can get stuck in local minima. In this paper, we present a novel extension of this class of spectral methods to learn dependency tree structures. We propose a simple yet powerful latent variable generative model for dependency parsing, and a spectral learning method to efficiently estimate it. As a pilot experimental evaluation, we use the spectral tree probabilities estimated by our model to re-rank the outputs of a near state-of-the-art parser. Our approach gives us a moderate reduction in error of up to 4.6% over the baseline re-ranker.

[1]  Karl Stratos,et al.  Spectral Learning of Latent-Variable PCFGs , 2012, ACL.

[2]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[5]  Fernando Pereira,et al.  Discriminative learning and spanning tree algorithms for dependency parsing , 2006 .

[6]  Ariadna Quattoni,et al.  Spectral Learning for Non-Deterministic Dependency Parsing , 2012, EACL.

[7]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Michael Collins,et al.  Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[10]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[11]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[12]  Gabriele Musillo,et al.  Unlexicalised Hidden Variable Models of Split Dependency Grammars , 2008, ACL.

[13]  Sebastiaan A. Terwijn,et al.  On the Learnability of Hidden Markov Models , 2002, ICGI.

[14]  Dean P. Foster,et al.  Multi-View Learning of Word Embeddings via CCA , 2011, NIPS.

[15]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[16]  Le Song,et al.  A Spectral Algorithm for Latent Tree Graphical Models , 2011, ICML.

[17]  Dean P. Foster,et al.  Spectral dimensionality reduction for HMMs , 2012, ArXiv.

[18]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[19]  Giorgio Satta,et al.  Efficient Parsing for Bilexical Context-Free Grammars and Head Automaton Grammars , 1999, ACL.

[20]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[21]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.