论文信息 - Head-Driven PCFGs with Latent-Head Statistics

Head-Driven PCFGs with Latent-Head Statistics

Although state-of-the-art parsers for natural language are lexicalized, it was recently shown that an accurate unlexicalized parser for the Penn tree-bank can be simply read off a manually refined tree-bank. While lexicalized parsers often suffer from sparse data, manual mark-up is costly and largely based on individual linguistic intuition. Thus, across domains, languages, and tree-bank annotations, a fundamental question arises: Is it possible to automatically induce an accurate parser from a tree-bank without resorting to full lexicalization? In this paper, we show how to induce head-driven probabilistic parsers with latent heads from a tree-bank. Our automatically trained parser has a performance of 85.7% (LP/LR F1), which is already better than that of early lexicalized ones.

Detlef Prescher | D. Prescher

[1] Michael Collins,et al. A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[2] Detlef Prescher,et al. A Tutorial on the Expectation-Maximization Algorithm Including Maximum-Likelihood Estimation and EM Training of Probabilistic Context-Free Grammars , 2004, ArXiv.

[3] Jun'ichi Tsujii,et al. Probabilistic CFG with Latent Annotations , 2005, ACL.

[4] Helmut Schmid. Efficient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors , 2004, COLING.

[5] David M. Magerman. Statistical Decision-Tree Models for Parsing , 1995, ACL.

[6] Ronald M. Kaplan,et al. Lexical Functional Grammar A Formal System for Grammatical Representation , 2004 .

[7] Detlef Prescher,et al. Inducing Head-Driven PCFGs with Latent Heads: Refining a Tree-Bank Grammar for Parsing , 2005, ECML.

[8] Michael I. Jordan,et al. Factorial Hidden Markov Models , 1995, Machine Learning.

[9] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[10] Mats Rooth,et al. Valence Induction with a Head-Lexicalized PCFG , 1998, EMNLP.

[11] Frank Keller,et al. Probabilistic Parsing for German Using Sister-Head Dependencies , 2003, ACL.