Annealing Structural Bias in Multilingual Weighted Grammar Induction

We first show how a structural locality bias can improve the accuracy of state-of-the-art dependency grammar induction models trained by EM from unannotated examples (Klein and Manning, 2004). Next, by annealing the free parameter that controls this bias, we achieve further improvements. We then describe an alternative kind of structural bias, toward "broken" hypotheses consisting of partial structures over segmented sentences, and show a similar pattern of improvement. We relate this approach to contrastive estimation (Smith and Eisner, 2005a), apply the latter to grammar induction in six languages, and show that our new approach improves accuracy by 1-17% (absolute) over CE (and 8-30% over EM), achieving to our knowledge the best results on this task to date. Our method, structural annealing, is a general technique with broad applicability to hidden-structure discovery problems.

[1]  Hiyan Alshawi,et al.  Head Automata and Bilingual Tiling: Translation with Minimal Representations , 1996, ACL.

[2]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[3]  Nir Friedman,et al.  Learning Hidden Variable Networks: The Information Bottleneck Approach , 2005, J. Mach. Learn. Res..

[4]  Eric Brill,et al.  Automatically Acquiring Phrase Structure Using Distributional Analysis , 1992, HLT.

[5]  Dilek Z. Hakkani-Tür,et al.  Building a Turkish Treebank , 2003 .

[6]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[7]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[8]  Kiril Ivanov Simov,et al.  Practical Annotation Scheme for an HPSG Treebank of Bulgarian , 2003, LINC@EACL.

[9]  Mark Steedman,et al.  Bootstrapping statistical parsers from small datasets , 2003, EACL.

[10]  P. Osenova,et al.  ‘An HPSG-based Syntactic Treebank of Bulgarian (BulTreeBank)’ , 2002 .

[11]  Jason Eisner Bilexical Grammars and a Cubic-time Probabilistic Parser , 1997, IWPT.

[12]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[13]  Kemal Oflazer,et al.  The Annotation Process in the Turkish Treebank , 2003, LINC@EACL.

[14]  Eckhard Bick,et al.  Floresta Sintá(c)tica: A treebank for Portuguese , 2002, LREC.

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Giorgio Satta,et al.  Efficient Parsing for Bilexical Context-Free Grammars and Head Automaton Grammars , 1999, ACL.

[17]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[18]  Damianos Karakos,et al.  Bootstrapping Without the Boot , 2005, HLT.

[19]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[20]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[21]  Petya Osenova,et al.  Design and Implementation of the Bulgarian HPSG-based Treebank , 2004 .

[22]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[23]  Noah A. Smith,et al.  Annealing Techniques For Unsupervised Statistical Language Learning , 2004, ACL.

[24]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[25]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[26]  Noah A. Smith,et al.  Guiding Unsupervised Grammar Induction Using Contrastive Estimation , 2005 .

[27]  Noah A. Smith,et al.  Parsing with Soft and Hard Constraints on Dependency Length , 2005 .

[28]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[29]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[30]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[31]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[32]  J. Baker Trainable grammars for speech recognition , 1979 .

[33]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[34]  Sabine Brants,et al.  The TIGER Treebank , 2001 .