论文信息 - A Best-First Probabilistic Shift-Reduce Parser - 字舞流文

A Best-First Probabilistic Shift-Reduce Parser

Recently proposed deterministic classifier-based parsers (Nivre and Scholz, 2004; Sagae and Lavie, 2005; Yamada and Mat-sumoto, 2003) offer attractive alternatives to generative statistical parsers. Deterministic parsers are fast, efficient, and simple to implement, but generally less accurate than optimal (or nearly optimal) statistical parsers. We present a statistical shift-reduce parser that bridges the gap between deterministic and probabilistic parsers. The parsing model is essentially the same as one previously used for deterministic parsing, but the parser performs a best-first search instead of a greedy search. Using the standard sections of the WSJ corpus of the Penn Treebank for training and testing, our parser has 88.1% precision and 87.8% recall (using automatically assigned part-of-speech tags). Perhaps more interestingly, the parsing model is significantly different from the generative models used by other well-known accurate parsers, allowing for a simple combination that produces precision and recall of 90.9% and 90.7%, respectively.

Alon Lavie | Kenji Sagae | A. Lavie | Kenji Sagae

[1] Daniel M. Bikel,et al. Design of a multi-lingual, parallel-processing statistical parsing engine , 2002 .

[2] Michael Collins,et al. Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[3] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[4] Brian Roark,et al. Measuring Efficiency in High-accuracy, Broad-coverage Statistical Parsing , 2000, ELSPS.

[5] Yuji Matsumoto,et al. Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[6] Donald E. Knuth,et al. On the Translation of Languages from Left to Right , 1965, Inf. Control..

[7] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8] Ted Briscoe,et al. Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars , 1993, CL.

[9] Michael Collins,et al. Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[10] Dan Klein,et al. Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[11] Alon Lavie,et al. A Classifier-Based Parser with Linear Run-Time Complexity , 2005, IWPT.

[12] Masaru Tomita,et al. The Generalized LR Parser/Compiler V8-4: A Software Package for Practical NL Projects , 1990, COLING.

[13] Eugene Charniak,et al. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[14] Eugene Charniak,et al. Edge-Based Best-First Chart Parsing , 1998, VLC@COLING/ACL.

[15] M. T. Lino,et al. Proceedings of the 4th International Conference on Language Resources and Evaluation , 2004 .

[16] Eugene Charniak,et al. A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[17] Rens Bod. An efficient implementation of a new DOP model , 2003, EACL.

[18] Adwait Ratnaparkhi,et al. A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[19] Jun'ichi Tsujii,et al. Chunk Parsing Revisited , 2005, IWPT.

[20] Joakim Nivre,et al. Deterministic Dependency Parsing of English Text , 2004, COLING.

[21] Lluís Màrquez i Villodre,et al. SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[22] Masaru Tomita,et al. An Efficient Augmented-Context-Free Parsing Algorithm , 1987, Comput. Linguistics.