Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

We show that categories induced by unsupervised word clustering can surpass the performance of gold part-of-speech tags in dependency grammar induction. Unlike classic clustering algorithms, our method allows a word to have different tags in different contexts. In an ablative analysis, we first demonstrate that this context-dependence is crucial to the superior performance of gold tags --- requiring a word to always have the same part-of-speech significantly degrades the performance of manual tags in grammar induction, eliminating the advantage that human annotation has over unsupervised tags. We then introduce a sequence modeling technique that combines the output of a word clustering algorithm with context-colored noise, to allow words to be tagged differently in different contexts. With these new induced tags as input, our state-of-the-art dependency grammar inducer achieves 59.1% directed accuracy on Section 23 (all sentences) of the Wall Street Journal (WSJ) corpus --- 0.7% higher than using gold tags.

[1]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[2]  Hiyan Alshawi Head automata for speech translation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Bart Selman,et al.  Noise Strategies for Improving Local Search , 1994, AAAI.

[4]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[5]  Julian M. Kupiec,et al.  Robust part-of-speech tagging using a hidden Markov model , 1992 .

[6]  Yoav Seginer,et al.  Fast Unsupervised Incremental Parsing , 2007, ACL.

[7]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[8]  Jianfeng Gao,et al.  A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers , 2008, EMNLP.

[9]  Deniz Yuret,et al.  Discovery of linguistic relations using lexical attraction , 1998, ArXiv.

[10]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[11]  Slav Petrov,et al.  Uptraining for Accurate Deterministic Question Parsing , 2010, EMNLP.

[12]  Roy Schwartz,et al.  Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation , 2011, ACL.

[13]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[14]  J. Baker Trainable grammars for speech recognition , 1979 .

[15]  Ari Rappoport,et al.  Improved Fully Unsupervised Parsing with Zoomed Learning , 2010, EMNLP.

[16]  Y. Seginer,et al.  Learning syntactic structure , 2007 .

[17]  Hinrich Schütze,et al.  Distributional Part-of-Speech Tagging , 1995, EACL.

[18]  Ari Rappoport,et al.  Improved Unsupervised POS Induction through Prototype Discovery , 2010, ACL.

[19]  Bart Cramer,et al.  Limitations of Current Grammar Induction Algorithms , 2007, ACL.

[20]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[21]  Valentin I. Spitkovsky,et al.  Punctuation: Making a Point in Unsupervised Dependency Parsing , 2011, CoNLL.

[22]  Mark Steedman,et al.  Two Decades of Unsupervised POS Induction: How Far Have We Come? , 2010, EMNLP.

[23]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[24]  Christopher D. Manning,et al.  The unsupervised learning of natural language structure , 2005 .

[25]  Terry Koo,et al.  Advances in discriminative dependency parsing , 2010 .

[26]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[27]  Michele Banko,et al.  Part-of-Speech Tagging in Context , 2004, COLING.

[28]  Srinivas Bangalore,et al.  Learning Dependency Translation Models as Collections of Finite-State Head Transducers , 2000, Computational Linguistics.

[29]  Federico Sangati,et al.  Unsupervised Methods for Head Assignments , 2009, EACL.

[30]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[31]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[32]  Christopher D. Manning,et al.  Joint Parsing and Named Entity Recognition , 2009, NAACL.

[33]  David Chiang,et al.  Recovering Latent Information in Treebanks , 2002, COLING.

[34]  Mark Johnson,et al.  Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing , 2009, NAACL.

[35]  Alexander Clark,et al.  Inducing Syntactic Categories by Context Distribution Clustering , 2000, CoNLL/LLL.

[36]  Valentin I. Spitkovsky,et al.  Baby Steps: How “Less is More” in Unsupervised Dependency Parsing , 2009 .

[37]  Valentin I. Spitkovsky,et al.  Viterbi Training Improves Unsupervised Dependency Parsing , 2010, CoNLL.

[38]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[39]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[40]  Valentin I. Spitkovsky,et al.  From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing , 2010, NAACL.

[41]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[42]  Christopher D. Manning,et al.  The Infinite Tree , 2007, ACL.

[43]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[44]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[45]  Rens Bod,et al.  An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.

[46]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[47]  Eugene Charniak,et al.  Evaluating Unsupervised Part-of-Speech Tagging for Grammar Induction , 2008, COLING.

[48]  Hiyan Alshawi,et al.  Deterministic Statistical Mapping of Sentences to Underspecified Semantics , 2011, IWCS.

[49]  Mark A. Paskin,et al.  Grammatical Bigrams , 2001, NIPS.