University of Turku in the BioNLP'11 Shared Task

BackgroundWe present a system for extracting biomedical events (detailed descriptions of biomolecular interactions) from research articles, developed for the BioNLP'11 Shared Task. Our goal is to develop a system easily adaptable to different event schemes, following the theme of the BioNLP'11 Shared Task: generalization, the extension of event extraction to varied biomedical domains. Our system extends our BioNLP'09 Shared Task winning Turku Event Extraction System, which uses support vector machines to first detect event-defining words, followed by detection of their relationships.ResultsOur current system successfully predicts events for every domain case introduced in the BioNLP'11 Shared Task, being the only system to participate in all eight tasks and all of their subtasks, with best performance in four tasks. Following the Shared Task, we improve the system on the Infectious Diseases task from 42.57% to 53.87% F-score, bringing performance into line with the similar GENIA Event Extraction and Epigenetics and Post-translational Modifications tasks. We evaluate the machine learning performance of the system by calculating learning curves for all tasks, detecting areas where additional annotated data could be used to improve performance. Finally, we evaluate the use of system output on external articles as additional training data in a form of self-training.ConclusionsWe show that the updated Turku Event Extraction System can easily be adapted to all presently available event extraction targets, with competitive performance in most tasks. The scope of the performance gains between the 2009 and 2011 BioNLP Shared Tasks indicates event extraction is still a new field requiring more work. We provide several analyses of event extraction methods and performance, highlighting potential future directions for continued development.

[1]  Jun'ichi Tsujii,et al.  Overview of BioNLP 2011 Protein Coreference Shared Task , 2011, BioNLP@ACL.

[2]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[3]  Fredric C. Gey,et al.  Proceedings of LREC , 2010 .

[4]  Sophia Ananiadou,et al.  Proceedings of BioNLP 2011 Workshop , 2011 .

[5]  Udo Hahn,et al.  Event Extraction from Trimmed Dependency Graphs , 2009, BioNLP@HLT-NAACL.

[6]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[7]  Jari Björne,et al.  EXTRACTING CONTEXTUALIZED COMPLEX BIOLOGICAL EVENTS WITH RICH GRAPH‐BASED FEATURE SETS , 2011, Comput. Intell..

[8]  Ellen Riloff,et al.  The Taming of Reconcile as a Biomedical Coreference Resolver , 2011, BioNLP@ACL.

[9]  Jun'ichi Tsujii,et al.  A Markov Logic Approach to Bio-Molecular Event Extraction , 2009, BioNLP@HLT-NAACL.

[10]  Halil Kilicoglu,et al.  Syntactic Dependency Based Heuristics for Biological Event Extraction , 2009, BioNLP@HLT-NAACL.

[11]  Jari Björne,et al.  Reconstruction of Semantic Relationships from Their Projections in Biomolecular Domain , 2010, BioNLP@ACL.

[12]  Jun'ichi Tsujii,et al.  Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task, BioNLP@HLT-NAACL 2009 - Shared Task, Boulder, Colorado, USA, June 5, 2009 , 2009, BioNLP@HLT-NAACL.

[13]  Jari Björne,et al.  Extracting Complex Biological Events with Rich Graph-Based Feature Sets , 2009, BioNLP@HLT-NAACL.

[14]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[15]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[16]  Andrew McCallum,et al.  Robust Biomedical Event Extraction with Dual Decomposition and Minimal Domain Adaptation , 2011, BioNLP@ACL.

[17]  Sampo Pyysalo,et al.  A Comparative Study of Syntactic Parsers for Event Extraction , 2010, BioNLP@ACL.

[18]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[19]  Tapio Salakoski,et al.  EVEX: A PubMed-Scale Resource for Homology-Based Generalization of Text Mining Predictions , 2011, BioNLP@ACL.

[20]  Jin-Dong Kim,et al.  Overview of the protein coreference task in BioNLP Shared Task 2011 , 2011 .

[21]  Sampo Pyysalo,et al.  Overview of the Entity Relations (REL) supporting task of BioNLP Shared Task 2011 , 2011, BioNLP@ACL.

[22]  Jun'ichi Tsujii Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task , 2009 .

[23]  Andrew McCallum,et al.  Model Combination for Event Extraction in BioNLP 2011 , 2011, BioNLP@ACL.

[24]  Sampo Pyysalo,et al.  Overview of the Epigenetics and Post-translational Modifications (EPI) task of BioNLP Shared Task 2011 , 2011, BioNLP@ACL.

[25]  Halil Kilicoglu,et al.  Adapting a General Semantic Interpretation Approach to Biological Event Extraction , 2011, BioNLP@ACL.

[26]  Jun'ichi Tsujii,et al.  Event Extraction with Complex Event Classification Using Rich Features , 2010, J. Bioinform. Comput. Biol..

[27]  Yvan Saeys,et al.  Analyzing text in search of bio-molecular events: a high-precision machine learning framework , 2009, BioNLP@HLT-NAACL.

[28]  J. Euzéby List of Bacterial Names with Standing in Nomenclature: a folder available on the Internet. , 1997, International journal of systematic bacteriology.

[29]  Aidan C. Parte,et al.  LPSN—list of prokaryotic names with standing in nomenclature , 2013, Nucleic Acids Res..

[30]  Sampo Pyysalo,et al.  BioNLP Shared Task 2011: Supporting Resources , 2011, BioNLP@ACL.

[31]  Mihai Surdeanu,et al.  Event Extraction as Dependency Parsing for BioNLP 2011 , 2011, BioNLP@ACL.

[32]  Antonio Jimeno-Yepes,et al.  Self-training and co-training in biomedical word sense disambiguation , 2011, BioNLP@ACL.

[33]  Johanna D. Moore,et al.  Implications for Generating Clarification Requests in Task-Oriented Dialogues , 2005, ACL.

[34]  Robert Bossy,et al.  BioNLP Shared Task 2011 - Bacteria Biotope , 2011, BioNLP@ACL.

[35]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[36]  Eugene Charniak,et al.  Any Domain Parsing: Automatic Domain Adaptation for Natural Language Parsing , 2010 .

[37]  Karën Fort,et al.  BioNLP Shared Task 2011 – Bacteria Gene Interactions and Renaming , 2011, BioNLP@ACL.

[38]  Jari Björne,et al.  Scaling up Biomedical Event Extraction to the Entire PubMed , 2010, BioNLP@ACL.

[39]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[40]  Sampo Pyysalo,et al.  Overview of the Infectious Diseases (ID) task of BioNLP Shared Task 2011 , 2011, BioNLP@ACL.

[41]  BMC Bioinformatics , 2005 .

[42]  Sophia Ananiadou,et al.  BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing , 2012 .

[43]  Claire Nedellec,et al.  BioNLP 2011 Task Bacteria Biotope – The Alvis system , 2011, BioNLP@ACL.

[44]  Akinori Yonezawa,et al.  Overview of Genia Event Task in BioNLP Shared Task 2011 , 2011, BioNLP@ACL.