A Comparative Study of Syntactic Parsers for Event Extraction

The extraction of biomolecular events from text is an important task for a number of domain applications such as pathway construction. Several syntactic parsers have been used in Biomedical Natural Language Processing (BioNLP) applications, and the BioNLP 2009 Shared Task results suggest that incorporation of syntactic analysis is important to achieving state-of-the-art performance. Direct comparison of parsers is complicated by to differences in the such as the division between phrase structure- and dependency-based analyses and the variety of output formats, structures and representations applied. In this paper, we present a task-oriented comparison of five parsers, measuring their contribution to biomolecular event extraction using a state-of-the-art event extraction system. The results show that the parsers with domain models using dependency formats provide very similar performance, and that an ensemble of different parsers in different formats can improve the event extraction system.

[1]  Jari Björne,et al.  Extracting Complex Biological Events with Rich Graph-Based Feature Sets , 2009, BioNLP@HLT-NAACL.

[2]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[3]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[4]  Jun'ichi Tsujii,et al.  Syntax Annotation for the GENIA Corpus , 2005, IJCNLP.

[5]  Halil Kilicoglu,et al.  Syntactic Dependency Based Heuristics for Biological Event Extraction , 2009, BioNLP@HLT-NAACL.

[6]  Eugene Charniak,et al.  Any Domain Parsing: Automatic Domain Adaptation for Natural Language Parsing , 2010 .

[7]  Christopher D. Manning,et al.  Stanford typed dependencies manual , 2010 .

[8]  Jun'ichi Tsujii,et al.  Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles , 2007, EMNLP.

[9]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[10]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[11]  Jun'ichi Tsujii,et al.  Data and text mining , 2005 .

[12]  Jun'ichi Tsujii,et al.  Event Extraction with Complex Event Classification Using Rich Features , 2010, J. Bioinform. Comput. Biol..

[13]  Daniel M. Bikel,et al.  A Distributional Analysis of a Lexicalized Statistical Parsing Model , 2004, EMNLP.

[14]  Tapio Salakoski,et al.  On the unification of syntactic annotations under the Stanford dependency scheme: A case study on BioInfer and GENIA , 2007, BioNLP@ACL.

[15]  Jun'ichi Tsujii,et al.  A Markov Logic Approach to Bio-Molecular Event Extraction , 2009, BioNLP@HLT-NAACL.

[16]  Jun'ichi Tsujii,et al.  Corpus annotation for mining biomedical events from literature , 2008, BMC Bioinformatics.

[17]  Stephen Clark,et al.  Porting a lexicalized-grammar parser to the biomedical domain , 2009, J. Biomed. Informatics.

[18]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.