Extraction of Genic Interactions with the Recursive Logical Theory of an Ontology

We introduce an Information Extraction (IE) system which uses the logical theory of an ontology as a generalisation of the typical information extraction patterns to extract biological interactions from text. This provides inferences capabilities beyond current approaches: first, our system is able to handle multiple relations; second, it allows to handle dependencies between relations, by deriving new relations from the previously extracted ones, and using inference at a semantic level; third, it addresses recursive or mutually recursive rules. In this context, automatically acquiring the resources of an IE system becomes an ontology learning task: terms, synonyms, conceptual hierarchy, relational hierarchy, and the logical theory of the ontology have to be acquired. We focus on the last point, as learning the logical theory of an ontology, and a fortiori of a recursive one, remains a seldom studied problem. We validate our approach by using a relational learning algorithm, which handles recursion, to learn a recursive logical theory from a text corpus on the bacterium Bacillus subtilis. This theory achieves a good recall and precision for the ten defined semantic relations, reaching a global recall of 67.7% and a precision of 75.5%, but more importantly, it captures complex mutually recursive interactions which were implicitly encoded in the ontology.

[1]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[2]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[3]  Noam Chomsky,et al.  The faculty of language: what is it, who has it, and how did it evolve? , 2002, Science.

[4]  Barbara Rosario,et al.  Classifying Semantic Relations in Bioscience Texts , 2004, ACL.

[5]  Antonio Varlaro,et al.  Learning recursive theories with the separate-and-parallel-conquer strategy , 2004 .

[6]  Dekang Lin,et al.  DIRT – Discovery of Inference Rules from Text , 2001 .

[7]  York Sure-Vetter,et al.  Learning Disjointness , 2007, ESWC.

[8]  Lu Lu,et al.  Bioinformatics analysis of immune response to group A streptococcal sepsis integrating quantitative trait loci mapping with genome-wide expression studies , 2008, BMC Bioinformatics.

[9]  Denise Brandão de Oliveira e Britto,et al.  The faculty of language , 2007 .

[10]  Philipp Cimiano,et al.  Ontology Learning from Text: Methods, Evaluation and Applications , 2005 .

[11]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[12]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[13]  Peer Bork,et al.  Large-scale Extraction of Protein/Gene Relations for Model Organisms , 2005 .

[14]  Claire Nédellec,et al.  Learning Language in Logic - Genic Interaction Extraction Challenge , 2005 .

[15]  Philippe Bessières,et al.  Information Extraction as an Ontology Population Task and Its Application to Genic Interactions , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[16]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[17]  Henrik Boström Induction of Recursive Transfer Rules , 1999, Learning Language in Logic.

[18]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[19]  Paul Buitelaar,et al.  Ontology Learning from Text: An Overview , 2005 .

[20]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[21]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[22]  Jari Björne,et al.  Comparative analysis of five protein-protein interaction corpora , 2008, BMC Bioinformatics.

[23]  Aldo Gangemi,et al.  Unsupervised Learning of Semantic Relations between Concepts of a Molecular Biology Ontology , 2005, IJCAI.

[24]  Michael Kifer,et al.  Logical foundations of object-oriented and frame-based languages , 1995, JACM.

[25]  Asunción Gómez-Pérez,et al.  Ontological Engineering: A state of the Art , 1999 .

[26]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[27]  Sophia Ananiadou,et al.  Text mining and its potential applications in systems biology. , 2006, Trends in biotechnology.

[28]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[29]  James Cussens,et al.  Learning Language in Logic , 2001, Lecture Notes in Computer Science.

[30]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[31]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[32]  Paul Buitelaar,et al.  A Multilingual/Multimedia Lexicon Model for Ontologies , 2006, ESWC.

[33]  Donato Malerba,et al.  Learning Recursive Patterns for Biomedical Information Extraction , 2007, ILP.

[34]  Paul Buitelaar,et al.  LexOnto: A Model for Ontology Lexicons for Ontology-based NLP , 2007 .