Ontology learning from biomedical natural language documents using UMLS

The generation of new knowledge is continuous in biomedical domains, thus biomedical literature is becoming harder to understand. Ontologies provide vocabulary standardization, so they can be helpful to facilitate the understanding of biomedical texts. In this work, a methodology for building biomedical ontologies from texts is presented. This approach relies on natural language processing and incremental knowledge acquisition techniques to obtain the relevant concepts and relations to be included in an OWL ontology. Additionally, we provide an algorithm to connect the isolated concepts regions in the ontology using UMLS. We also discuss in this paper the experiment carried out to validate our approach and its positive results in terms of performance and scalability.

[1]  Jun'ichi Tsujii,et al.  Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data , 2005, HLT.

[2]  A. Rector,et al.  Relations in biomedical ontologies , 2005, Genome Biology.

[3]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .

[4]  Wolfgang Nejdl,et al.  How valuable is medical social media data? Content analysis of the medical web , 2009, Inf. Sci..

[5]  Xiaofen He,et al.  A protocol for constructing a domain-specific ontology for use in biomedical information extraction using lexical-chaining analysis , 2007 .

[6]  Johanna Völker,et al.  A Framework for Ontology Learning and Data-driven Change Discovery , 2005 .

[7]  Wen-Lian Hsu,et al.  BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features , 2007, BMC Bioinformatics.

[8]  Neville Ryant,et al.  A Large-scale Classication of English Verbs , 2006 .

[9]  Ana María Moreno,et al.  Knowledge maps: An essential technique for conceptualisation , 2000, Data Knowl. Eng..

[10]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[11]  Byeong Ho Kang,et al.  Multiple Classification Ripple Down Rules : Evaluation and Possibilities , 2000 .

[12]  Mark A. Musen,et al.  The PROMPT suite: interactive tools for ontology merging and mapping , 2003, Int. J. Hum. Comput. Stud..

[13]  Steffen Staab,et al.  Knowledge Processes and Ontologies , 2001, IEEE Intell. Syst..

[14]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[15]  Teruyoshi Hishiki,et al.  Extraction of Gene-Disease Relations from Medline Using Domain Dictionaries and Machine Learning , 2005, Pacific Symposium on Biocomputing.

[16]  Teruyoshi Hishiki,et al.  Automatic recognition of topic-classified relations between prostate cancer and genes using MEDLINE abstracts , 2006, BMC Bioinformatics.

[17]  Ricardo Colomo Palacios,et al.  ODDIN: Ontology-driven differential diagnosis based on logical inference and probabilistic refinements , 2010, Expert Syst. Appl..

[18]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[19]  Jan Scheffczyk,et al.  BioFrameNet: A Domain-Specific FrameNet Extension with Links to Biomedical Ontologies , 2006, KR-MED.

[20]  Carol Friedman,et al.  Bio-Ontology and text: bridging the modeling gap , 2006, Bioinform..

[21]  Barbara Rosario,et al.  Classifying Semantic Relations in Bioscience Texts , 2004, ACL.

[22]  Rafael Valencia-García,et al.  An incremental approach for discovering medical knowledge from texts , 2004, Expert Syst. Appl..

[23]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[24]  Miguel García-Remesal,et al.  ONTOFUSION: Ontology-based integration of genomic and clinical databases , 2006, Comput. Biol. Medicine.

[25]  Sophia Ananiadou,et al.  Text Mining for Biology And Biomedicine , 2005 .

[26]  T. Takagi,et al.  Toward information extraction: identifying protein names from biological papers. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[27]  Hae-Chang Rim,et al.  Biomedical named entity recognition using two-phase model based on SVMs , 2004, J. Biomed. Informatics.

[28]  Qinglin Guo,et al.  Semantic information integration and question answering based on pervasive agent ontology , 2009, Expert Syst. Appl..

[29]  Hilary Cheng,et al.  An ontology-based business intelligence application in a financial knowledge management system , 2009, Expert Syst. Appl..

[30]  Jian Su,et al.  Effective Adaptation of Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain , 2003, BioNLP@ACL.

[31]  Yimin Wang,et al.  Towards Semi-automatic Ontology Building Supported by Large-Scale Knowledge Acquisition , 2006, AAAI Fall Symposium: Semantic Web for Collaborative Knowledge Acquisition.

[32]  Rafael Valencia-García,et al.  A knowledge acquisition methodology to ontology construction for information retrieval from medical documents , 2008, Expert Syst. J. Knowl. Eng..

[33]  Hans-Peter Kriegel,et al.  Extraction of semantic biomedical relations from text using conditional random fields , 2008, BMC Bioinformatics.

[34]  Sheng-Yuan Yang OntoPortal: An ontology-supported portal architecture with linguistically enhanced and focused crawler technologies , 2009, Expert Syst. Appl..

[35]  Jun'ichi Tsujii,et al.  Part-of-Speech Annotation of Biology Research Abstracts , 2004, LREC.

[36]  Frank van Harmelen,et al.  Extraction and use of linguistic patterns for modelling medical guidelines , 2007, Artif. Intell. Medicine.

[37]  Ross D King,et al.  Are the current ontologies in biology good ontologies? , 2005, Nature Biotechnology.

[38]  Rafael Muñoz,et al.  Combining automatic acquisition of knowledge with machine learning approaches for multilingual temporal recognition and normalization , 2008, Inf. Sci..

[39]  Yugyung Lee,et al.  Ontology integration: Experience with medical terminologies , 2006, Comput. Biol. Medicine.

[40]  Paul Buitelaar,et al.  RelExt: A Tool for Relation Extraction from Text in Ontology Extension , 2005, SEMWEB.

[41]  David Sánchez,et al.  Learning non-taxonomic relationships from web documents for domain ontology construction , 2008, Data Knowl. Eng..

[42]  Lawrence Hunter,et al.  Enrichment of OBO ontologies , 2007, J. Biomed. Informatics.

[43]  Nigel Collier,et al.  PASBio: predicate-argument structures for event extraction in molecular biology , 2004, BMC Bioinformatics.

[44]  Olatz Ansa,et al.  Enriching very large ontologies using the WWW , 2000, ECAI Workshop on Ontology Learning.