Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation

Broad-coverage parsing has come to a point where distinct approaches can offer (seemingly) comparable performance: statistical parsers acquired from the Penn Treebank (PTB); data-driven dependency parsers; "deep" parsers trained off enriched treebanks (in linguistic frameworks like CCG, HPSG, or LFG); and hybrid "deep" parsers, employing hand-built grammars in, for example, HPSG, LFG, or LTAG. Evaluation against trees in the Wall Street Journal (WSJ) section of the PTB has helped advance parsing research over the course of the past decade. Despite some skepticism, the crisp and, over time, stable task of maximizing ParsEval metrics (i.e. constituent labeling precision and recall) over PTB trees has served as a dominating benchmark. However, modern treebank parsers still restrict themselves to only a subset of PTB annotation; there is reason to worry about the idiosyncrasies of this particular corpus; it remains unknown how much the ParsEval metric (or any intrinsic evaluation) can inform NLP application developers; and PTB-style analyses leave a lot to be desired in terms of linguistic information.

[1]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[2]  Liliane Haegeman,et al.  Introduction to Government and Binding Theory , 1991 .

[3]  Peter Norvig,et al.  Verbmobih A Translation System for Face-to-Face Dialog , 1994 .

[4]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[5]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[7]  Dekang Lin,et al.  A dependency-based method for evaluating broad-coverage parsers , 1995, Natural Language Engineering.

[8]  Lorna Balkan,et al.  TSNLP - Test Suites for Natural Language Processing , 1996, COLING.

[9]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[10]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[11]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[12]  J. Bresnan Lexical-Functional Syntax , 2000 .

[13]  Ivan A. Sag,et al.  Syntactic Theory: A Formal Introduction , 1999, Computational Linguistics.

[14]  Dov M. Gabbay,et al.  Dynamic syntax - the flow of language understanding , 2000 .

[15]  Jean-Luc Gauvain,et al.  Improved ROVER using Language Model Information , 2000 .

[16]  Johan Bos,et al.  An Inference-based Approach to Dialogue System Design , 2002, COLING.

[17]  Kalina Bontcheva,et al.  GATE: an Architecture for Development of Robust HLT applications , 2002, ACL.

[18]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[19]  Julia Hockenmaier,et al.  Data and models for statistical parsing with combinatory categorial grammar , 2003 .

[20]  Sanda M. Harabagiu,et al.  Using Predicate-Argument Structures for Information Extraction , 2003, ACL.

[21]  James R. Curran,et al.  Log-Linear Models for Wide-Coverage CCG Parsing , 2003, EMNLP.

[22]  Mary Dalrymple,et al.  The PARC 700 Dependency Bank , 2003, LINC@EACL.

[23]  C. Rosé The Role of Why Questions in Effective Human Tutoring , 2003 .

[24]  Judita Preiss Using Grammatical Relations to Compare Parsers , 2003, EACL.

[25]  Owen Rambow,et al.  Use of Deep Linguistic Features for the Recognition and Labeling of Semantic Arguments , 2003, EMNLP.

[26]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[27]  Ivana Kruijff-Korbayová,et al.  Analysis of Mixed Natural and Symbolic Input in Mathematical Dialogs , 2004, ACL.

[28]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[29]  Andy Way,et al.  Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations , 2004, ACL.

[30]  Patrick Paroubek,et al.  The Ongoing Evaluation Campaign of Syntactic Parsing of French: EASY , 2004, LREC.

[31]  Jun'ichi Tsujii,et al.  Corpus-Oriented Grammar Development for Acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank , 2004, IJCNLP.

[32]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[33]  Peter Z. Yeh,et al.  Matching utterances to rich knowledge structures to acquire a model of the speaker's goal , 2005, K-CAP '05.

[34]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[35]  Hermann Ney,et al.  Automatic sentence segmentation and punctuation prediction for spoken language translation , 2006, IWSLT.

[36]  Ted Briscoe,et al.  The Second Release of the RASP System , 2006, ACL.

[37]  Thierry Declerck SynAF: Towards a Standard for Syntactic Annotation , 2006, LREC.

[38]  Yusuke Miyao,et al.  From Linguistic Theory to Syntactic Analysis : Corpus-Oriented Grammar Development and Feature Forest Model , 2006 .

[39]  Katrin Erk,et al.  HALMANESER – A Toolchain For Shallow Semantic Parsing , 2006 .

[40]  Patrick Paroubek,et al.  Data, Annotations and Measures in EASY the Evaluation Campaign for Parsers of French. , 2006, LREC.

[41]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[42]  Kurt VanLehn,et al.  A Natural Language Tutorial Dialogue System for Physics , 2006, FLAIRS Conference.

[43]  Ted Briscoe,et al.  Evaluating the Accuracy of an Unlexicalized Statistical Parser on the PARC DepBank , 2006, ACL.

[44]  Bonnie Lynn Webber,et al.  Question Answering based on Semantic Roles , 2007, ACL 2007.

[45]  Yusuke Miyao,et al.  Towards Framework-Independent Evaluation of Deep Linguistic Parsers , 2007 .

[46]  Tapio Salakoski,et al.  On the unification of syntactic annotations under the Stanford dependency scheme: A case study on BioInfer and GENIA , 2007, BioNLP@ACL.

[47]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[48]  Johanna D. Moore,et al.  The Beetle and BeeDiff tutoring systems , 2007, SLaTE.

[49]  James R. Curran,et al.  Formalism-Independent Parser Evaluation with CCG and DepBank , 2007, ACL.

[50]  Georg Heigold,et al.  The RWTH 2007 TC-STAR evaluation system for european English and Spanish , 2007, INTERSPEECH.

[51]  James F. Allen,et al.  Deep Linguistic Processing for Spoken Dialogue Systems , 2007, ACL 2007.

[52]  Gil Francopoulo,et al.  TagParser: well on the way to ISO-TC37 conformance , 2008 .

[53]  Thierry Declerck,et al.  Data Category Registry: Morpho-syntactic and Syntactic Profiles , 2008 .

[54]  Johanna D. Moore,et al.  Diagnosing Natural Language Answers to Support Adaptive Tutoring , 2008, FLAIRS Conference.

[55]  Patrick Paroubek,et al.  Large scale production of syntactic annotations for French , 2008 .

[56]  Patrick Paroubek,et al.  EASY, Evaluation of Parsers of French: what are the Results? , 2008, LREC.

[57]  Patrick Paroubek,et al.  PASSAGE: from French Parser Evaluation to Large Sized Treebank , 2008, LREC.

[58]  Jun'ichi Tsujii,et al.  GENIA-GR: a Grammatical Relation Corpus for Parser Evaluation in the Biomedical Domain , 2008, LREC.