Accurate Unlexicalized Parsing

We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence assumptions latent in a vanilla treebank grammar. Indeed, its performance of 86.36% (LP/LR F1) is better than that of early lexicalized PCFG models, and surprisingly close to the current state-of-the-art. This result has potential uses beyond establishing a strong lower bound on the maximum possible accuracy of unlexicalized models: an unlexicalized PCFG is much more compact, easier to replicate, and easier to interpret than more complex lexical models, and the parsing algorithms are simpler, more widely understood, of lower asymptotic complexity, and easier to optimize.

[1]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[2]  Taylor L. Booth,et al.  Applying Probability Measures to Abstract Languages , 1973, IEEE Transactions on Computers.

[3]  J. Baker Trainable grammars for speech recognition , 1979 .

[4]  M. Baltin,et al.  The Mental representation of grammatical relations , 1985 .

[5]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[6]  Dana Ron,et al.  The Power of Amnesia , 1993, NIPS.

[7]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[8]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[9]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[10]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[11]  New Figures of Merit for Best-First Probabilistic Chart Parsing , 1998, CL.

[12]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[13]  Eugene Charniak,et al.  Edge-Based Best-First Chart Parsing , 1998, VLC@COLING/ACL.

[14]  Giorgio Satta,et al.  Efficient Parsing for Bilexical Context-Free Grammars and Head Automaton Grammars , 1999, ACL.

[15]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[16]  Radford,et al.  转换生成语法教程 = Transformational Grammar , 2000 .

[17]  Dan Klein,et al.  Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank , 2001, ACL.

[18]  Eugene Charniak,et al.  Immediate-Head Parsing for Language Models , 2001, ACL.

[19]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[20]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.