Challenges in Mapping of Syntactic Representations for Framework-Independent Parser Evaluation

We explore some of the issues and challenges created by the incompatibility of diverse representation schemes for syntactic parsing. In particular, we examine the problem of output format conversion for evaluation of parsers that use different formalisms. We discuss recent related efforts, and present an evaluation of different parsers that use representations that vary not only in formalisms, but also in depth of syntactic information. We attempt to compare these parsers in a domain widely used for parser evaluation, the Wall Street Journal section of the Penn Treebank, and in the academic biomedical literature, where the use of parsing technologies is expected to contribute in practical applications, such as information extraction and text mining.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[3]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[4]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[5]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[6]  Adwait Ratnaparkhi,et al.  A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[7]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[8]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[9]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[10]  Mark Steedman,et al.  Acquiring Compact Lexicalized Grammars from a Cleaner Treebank , 2002, LREC.

[11]  Julia Hockenmaier Parsing with Generative Models of Predicate-Argument Structure , 2003, ACL.

[12]  Mary Dalrymple,et al.  The PARC 700 Dependency Bank , 2003, LINC@EACL.

[13]  Judita Preiss Using Grammatical Relations to Compare Parsers , 2003, EACL.

[14]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[15]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[16]  Stefan Riezler,et al.  Speed and Accuracy in Shallow and Deep Stochastic Parsing , 2004, NAACL.

[17]  Christopher D. Manning,et al.  The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection , 2004 .

[18]  Stephan Oepen,et al.  Towards holistic grammar engineering and testing : grafting treebank maintenance into the grammar revision cycle. , 2004 .

[19]  Joakim Nivre,et al.  Deterministic Dependency Parsing of English Text , 2004, COLING.

[20]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[21]  Andy Way,et al.  Evaluation of an automatic f-structure annotation algorithm against the PARC 700 dependency bank , 2004 .

[22]  James R. Curran,et al.  Parsing the WSJ Using CCG and Log-Linear Models , 2004, ACL.

[23]  Jun'ichi Tsujii,et al.  Corpus-Oriented Grammar Development for Acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank , 2004, IJCNLP.

[24]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[25]  Jun'ichi Tsujii,et al.  Probabilistic Disambiguation Models for Wide-Coverage HPSG Parsing , 2005, ACL.

[26]  Matthew Lease,et al.  Parsing Biomedical Literature , 2005, IJCNLP.

[27]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[28]  Benchmarking natural-language parsers for biological applications using dependency graphs , 2007, BMC Bioinformatics.

[29]  Ted Briscoe,et al.  An introduction to tag sequence grammars and the RASP system parser , 2006 .

[30]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[31]  Tsujii Jun'ichi,et al.  Efficient HPSG Parsing with Supertagging and CFG-filtering , 2006 .

[32]  Ted Briscoe,et al.  The Second Release of the RASP System , 2006, ACL.

[33]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[34]  Ted Briscoe,et al.  Evaluating the Accuracy of an Unlexicalized Statistical Parser on the PARC DepBank , 2006, ACL.

[35]  Tapio Salakoski,et al.  Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches , 2006, BMC Bioinformatics.

[36]  Yusuke Miyao,et al.  Towards Framework-Independent Evaluation of Deep Linguistic Parsers , 2007 .

[37]  Tapio Salakoski,et al.  On the unification of syntactic annotations under the Stanford dependency scheme: A case study on BioInfer and GENIA , 2007, BioNLP@ACL.

[38]  James R. Curran,et al.  Formalism-Independent Parser Evaluation with CCG and DepBank , 2007, ACL.

[39]  Jun'ichi Tsujii,et al.  A log-linear model with an n-gram reference distribution for accurate HPSG parsing , 2007, IWPT.

[40]  Jun'ichi Tsujii,et al.  HPSG Parsing with Shallow Dependency Constraints , 2007, ACL.