Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output

Constituency parser performance is primarily interpreted through a single metric, F-score on WSJ section 23, that conveys no linguistic information regarding the remaining errors. We classify errors within a set of linguistically meaningful types using tree transformations that repair groups of errors together. We use this analysis to answer a range of questions about parser behaviour, including what linguistic constructions are difficult for state-of-the-art parsers, what types of errors are being resolved by rerankers, and what types are introduced when parsing out-of-domain text.

[1]  Emily M. Bender,et al.  Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus , 2011, EMNLP.

[2]  Joakim Nivre,et al.  Characterizing the Errors of Data-Driven Dependency Parsing Models , 2007, EMNLP.

[3]  Jun'ichi Tsujii,et al.  Descriptive and Empirical Approaches to Capturing Underlying Dependencies among Parsing Errors , 2009, EMNLP.

[4]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[5]  Stephan Oepen,et al.  Parser Evaluation Using Elementary Dependency Matching , 2011, IWPT.

[6]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[7]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[8]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[9]  James R. Curran,et al.  Reranking a wide-coverage ccg parser , 2010, ALTA.

[10]  Chris Quirk,et al.  The impact of parse quality on syntactically-informed statistical machine translation , 2006, EMNLP.

[11]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[12]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[13]  Andrew B. Clegg,et al.  Evaluating and Integrating Treebank Parsers on a Biomedical Corpus , 2005, ACL 2005.

[14]  Jun'ichi Tsujii,et al.  Task-oriented Evaluation of Syntactic Parsers and Their Representations , 2008, ACL.

[15]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[16]  Ted Briscoe,et al.  Relational evaluation schemes , 2002 .

[17]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[18]  Joakim Nivre,et al.  Evaluation of Dependency Parsers on Unbounded Dependencies , 2010, COLING.

[19]  Daniel Jurafsky,et al.  Parsing to Stanford Dependencies: Trade-offs between Speed and Accuracy , 2010, LREC.

[20]  Ted Briscoe,et al.  Evaluating the Accuracy of an Unlexicalized Statistical Parser on the PARC DepBank , 2006, ACL.

[21]  Jun'ichi Tsujii,et al.  Evaluating Impact of Re-training a Lexical Disambiguation Model on Domain Adaptation of an HPSG Parser , 2007, Trends in Parsing Technology.

[22]  Brian Roark,et al.  Beam-Width Prediction for Efficient Context-Free Parsing , 2011, ACL.

[23]  Adam Lopez,et al.  A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing , 2011, ACL.

[24]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[25]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[26]  James Henderson Inducing History Representations for Broad Coverage Statistical Parsing , 2003, HLT-NAACL.

[27]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[28]  Stephen Clark,et al.  Evaluating a Wide-Coverage CCG Parser , 2013 .

[29]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[30]  Deniz Yuret,et al.  SemEval-2010 Task 12: Parser Evaluation Using Textual Entailments , 2010, *SEMEVAL.

[31]  James Henderson,et al.  Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[32]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[33]  Brian Roark,et al.  Efficient Matrix-Encoded Grammars and Low Latency Parallelization Strategies for CYK , 2011, IWPT.

[34]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[35]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[36]  Josef van Genabith,et al.  Parser Evaluation and the BNC: Evaluating 4 constituency parsers with 3 metrics , 2008, LREC.

[37]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[38]  Dekang Lin,et al.  A dependency-based method for evaluating broad-coverage parsers , 1995, Natural Language Engineering.

[39]  Mary Dalrymple,et al.  The PARC 700 Dependency Bank , 2003, LINC@EACL.

[40]  Kun Yu,et al.  Analysis of the Difficulties in Chinese Deep Parsing , 2011, IWPT.

[41]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.