Unsupervised Methods for Head Assignments

We present several algorithms for assigning heads in phrase structure trees, based on different linguistic intuitions on the role of heads in natural language syntax. Starting point of our approach is the observation that a head-annotated treebank defines a unique lexicalized tree substitution grammar. This allows us to go back and forth between the two representations, and define objective functions for the unsupervised learning of head assignments in terms of features of the implicit lexicalized tree grammars. We evaluate algorithms based on the match with gold standard head-annotations, and the comparative parsing accuracy of the lexicalized grammars they give rise to. On the first task, we approach the accuracy of hand-designed heuristics for English and inter-annotation-standard agreement for German. On the second task, the implied lexicalized grammars score 4% points higher on parsing accuracy than lexicalized grammars derived by commonly used heuristics.

[1]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[2]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[3]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[4]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[5]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[6]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[7]  Aravind K. Joshi,et al.  Tree-adjoining grammars and lexicalized grammars , 1992, Tree Automata and Languages.

[8]  Y TOMASB Some notes on the PARC 700 Dependency Bank , 2007 .

[9]  Mary Dalrymple,et al.  The PARC 700 Dependency Bank , 2003, LINC@EACL.

[10]  Ari Rappoport,et al.  Unsupervised Induction of Labeled Parse Trees by Clustering with Syntactic Features , 2008, COLING.

[11]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[12]  Berthold Crysmann,et al.  Towards a Dependency-Based Gold Standard for German Parsers. The TIGER Dependency Bank , 2004, International Workshop On Linguistically Interpreted Corpora.

[13]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[14]  Scott McGlashan,et al.  Heads in grammatical theory , 1993 .

[15]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[16]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[17]  David Chiang,et al.  Statistical Parsing with an Automatically-Extracted Tree Adjoining Grammar , 2000, ACL.

[18]  Joakim Nivre,et al.  MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.

[19]  Y. Seginer,et al.  Learning syntactic structure , 2007 .

[20]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[21]  Hinrich Schütze,et al.  Part-of-Speech Induction From Scratch , 1993, ACL.

[22]  Rens Bod,et al.  Beyond Grammar: An Experience-Based Theory of Language , 1998 .

[23]  Willem H. Zuidema,et al.  Bayesian Model Merging for Unsupervised Constituent Labeling and Grammar Induction , 2022 .

[24]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[25]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[26]  David Chiang,et al.  Recovering Latent Information in Treebanks , 2002, COLING.

[27]  Joakim Nivre,et al.  Inductive Dependency Parsing , 2006, Text, speech and language technology.