论文信息 - Annealing Structural Bias in Multilingual Weighted Grammar Induction

Annealing Structural Bias in Multilingual Weighted Grammar Induction

We first show how a structural locality bias can improve the accuracy of state-of-the-art dependency grammar induction models trained by EM from unannotated examples (Klein and Manning, 2004). Next, by annealing the free parameter that controls this bias, we achieve further improvements. We then describe an alternative kind of structural bias, toward "broken" hypotheses consisting of partial structures over segmented sentences, and show a similar pattern of improvement. We relate this approach to contrastive estimation (Smith and Eisner, 2005a), apply the latter to grammar induction in six languages, and show that our new approach improves accuracy by 1-17% (absolute) over CE (and 8-30% over EM), achieving to our knowledge the best results on this task to date. Our method, structural annealing, is a general technique with broad applicability to hidden-structure discovery problems.

Noah A. Smith | Jason Eisner

[1] Hiyan Alshawi,et al. Head Automata and Bilingual Tiling: Translation with Minimal Representations , 1996, ACL.

[2] Noah A. Smith,et al. Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[3] Nir Friedman,et al. Learning Hidden Variable Networks: The Information Bottleneck Approach , 2005, J. Mach. Learn. Res..

[4] Eric Brill,et al. Automatically Acquiring Phrase Structure Using Distributional Analysis , 1992, HLT.

[5] Dilek Z. Hakkani-Tür,et al. Building a Turkish Treebank , 2003 .

[6] Donald Hindle,et al. Noun Classification From Predicate-Argument Structures , 1990, ACL.

[7] Dan Klein,et al. Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[8] Kiril Ivanov Simov,et al. Practical Annotation Scheme for an HPSG Treebank of Bulgarian , 2003, LINC@EACL.

[9] Mark Steedman,et al. Bootstrapping statistical parsers from small datasets , 2003, EACL.

[10] P. Osenova,et al. ‘An HPSG-based Syntactic Treebank of Bulgarian (BulTreeBank)’ , 2002 .

[11] Jason Eisner. Bilexical Grammars and a Cubic-time Probabilistic Parser , 1997, IWPT.