Automatic Extraction of Information about the Molecular Interactions in Biological Pathways from Texts Based on Ontology and Semantic Processing

We develop a framework using ontology inference and semantic processing techniques to help biologists to extract knowledge directly from a large scale of biological literature in NCBI PubMed. The system integrated various sharable thesauri of WordNet, MeSH (Medical Subject Heading), and GO (Gene ontology) to support the automatic semantic annotation and analysis. The natural language processing and semantic processing are facilitated by the ontological inference, and the system could automatically extract the correct molecular interactions from the complex sentences in an abstract automatically. It facilitates the biologists not only to save time and efforts to construct and analyze biological pathways, but also to discover the novel molecular interactions by comparing the information extracted from the literature with that in such existing pathway database as KEGG. We evaluated the system performance based on the pathways in Apoptosis domain.

[1]  Peter Willett,et al.  Protein Structures and Information Extraction from Biological Texts: The PASTA System , 2003, Bioinform..

[2]  Ian Horrocks,et al.  Building a Reason-able Bioinformatics Ontology Using OIL , 2001, OIS@IJCAI.

[3]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[4]  B. Barrell,et al.  Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence , 1998, Nature.

[5]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[6]  Jung-Hsien Chiang,et al.  MeKE: Discovering the Functions of Gene Products from Biomedical Literature Via Sentence Alignment , 2003, Bioinform..

[7]  Peter D. Karp,et al.  An ontology for biological function based on molecular interactions , 2000, Bioinform..

[8]  Günther Zehetner,et al.  OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms , 2003, Nucleic Acids Res..

[9]  Steffen Schulze-Kremer,et al.  Ontologies for Molecular Biology , 2001, Electron. Trans. Artif. Intell..

[10]  Mark R. Gilder,et al.  Extraction of protein interaction information from unstructured text using a context-free grammar , 2003, Bioinform..

[11]  Hideki Mima,et al.  Terminology-driven literature mining and knowledge acquisition in biomedicine , 2002, Int. J. Medical Informatics.

[12]  Ian Horrocks,et al.  Building a bioinformatics ontology using OIL , 2002, IEEE Transactions on Information Technology in Biomedicine.

[13]  M. Gerstein,et al.  Ontologies for proteomics: towards a systematic definition of structure and function that scales to the genome level. , 2003, Current opinion in chemical biology.

[14]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[15]  Carole A. Goble,et al.  Ontology-based Knowledge Representation for Bioinformatics , 2000, Briefings Bioinform..

[16]  Ian Horrocks,et al.  OILing the way to machine understandable bioinformatics resources , 2002, IEEE Transactions on Information Technology in Biomedicine.

[17]  J M Thornton,et al.  From Genome to Function , 2001, Science.

[18]  Michael Krauthammer,et al.  GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data , 2004, J. Biomed. Informatics.

[19]  Sergei Egorov,et al.  MedScan, a natural language processing engine for MEDLINE abstracts , 2003, Bioinform..