Improving the extraction of complex regulatory events from scientific text by using ontology-based inference

BackgroundThe extraction of complex events from biomedical text is a challenging task and requires in-depth semantic analysis. Previous approaches associate lexical and syntactic resources with ontologies for the semantic analysis, but fall short in testing the benefits from the use of domain knowledge.ResultsWe developed a system that deduces implicit events from explicitly expressed events by using inference rules that encode domain knowledge. We evaluated the system with the inference module on three tasks: First, when tested against a corpus with manually annotated events, the inference module of our system contributes 53.2% of correct extractions, but does not cause any incorrect results. Second, the system overall reproduces 33.1% of the transcription regulatory events contained in RegulonDB (up to 85.0% precision) and the inference module is required for 93.8% of the reproduced events. Third, we applied the system with minimum adaptations to the identification of cell activity regulation events, confirming that the inference improves the performance of the system also on this task.ConclusionsOur research shows that the inference based on domain knowledge plays a significant role in extracting complex events from text. This approach has great potential in recognizing the complex concepts of such biomedical ontologies as Gene Ontology in the literature.

[1]  Boris Motik,et al.  Query Answering for OWL-DL with Rules , 2004, SEMWEB.

[2]  Dietrich Rebholz-Schuhmann,et al.  How Feasible and Robust is the Automatic Extraction of Gene Regulation Events? A Cross-Method Evaluation under Lab and Real-Life Conditions , 2009, BioNLP@HLT-NAACL.

[3]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[4]  Dietrich Rebholz-Schuhmann,et al.  MedEvi: Retrieving textual evidence of relations between biomedical concepts from Medline , 2008, Bioinform..

[5]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[6]  K. E. Ravikumar,et al.  Beyond the clause: extraction of phosphorylation information from medline abstracts , 2005, ISMB.

[7]  Uwe Reyle,et al.  Ontology-driven discourse analysis for information extraction , 2005, Data Knowl. Eng..

[8]  S. Busby,et al.  The regulation of bacterial transcription initiation , 2004, Nature Reviews Microbiology.

[9]  Lars Juhl Jensen,et al.  Large-scale extraction of gene regulation for model organisms in an ontological context , 2004, Silico Biol..

[10]  Andrew McCallum,et al.  Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text , 2006, NAACL.

[11]  Key-Sun Choi,et al.  Phrase-Pattern-based Korean to English Machine Translation using Two Level Translation Pattern Selection , 2000, ACL.

[12]  Peter Willett,et al.  Protein Structures and Information Extraction from Biological Texts: The PASTA System , 2003, Bioinform..

[13]  Anton Yuryev,et al.  Extracting human protein interactions from MEDLINE using a full-sentence parser , 2004, Bioinform..

[14]  Jun'ichi Tsujii,et al.  HPSG Parsing with Shallow Dependency Constraints , 2007, ACL.

[15]  Dietrich Rebholz-Schuhmann,et al.  Gene Regulation Ontology (GRO): Design Principles and Use Cases , 2008, MIE.

[16]  Julio Collado-Vides,et al.  Automatic reconstruction of a bacterial regulatory network using Natural Language Processing , 2007, BMC Bioinformatics.

[17]  Zhiyong Lu,et al.  OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression , 2008, BMC Bioinformatics.