A multi-ontology approach to annotate scientific documents based on a modularization technique

Scientific text annotation has become an important task for biomedical scientists. Nowadays, there is an increasing need for the development of intelligent systems to support new scientific findings. Public databases available on the Web provide useful data, but much more useful information is only accessible in scientific texts. Text annotation may help as it relies on the use of ontologies to maintain annotations based on a uniform vocabulary. However, it is difficult to use an ontology, especially those that cover a large domain. In addition, since scientific texts explore multiple domains, which are covered by distinct ontologies, it becomes even more difficult to deal with such task. Moreover, there are dozens of ontologies in the biomedical area, and they are usually big in terms of the number of concepts. It is in this context that ontology modularization can be useful. This work presents an approach to annotate scientific documents using modules of different ontologies, which are built according to a module extraction technique. The main idea is to analyze a set of single-ontology annotations on a text to find out the user interests. Based on these annotations a set of modules are extracted from a set of distinct ontologies, and are made available for the user, for complementary annotation. The reduced size and focus of the extracted modules tend to facilitate the annotation task. An experiment was conducted to evaluate this approach, with the participation of a bioinformatician specialist of the Laboratory of Peptides and Proteins of the IOC/Fiocruz, who was interested in discovering new drug targets aiming at the combat of tropical diseases.

[1]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[2]  Ian Horrocks,et al.  Modular Reuse of Ontologies: Theory and Practice , 2008, J. Artif. Intell. Res..

[3]  Maria Cláudia Reis Cavalcanti,et al.  Analyzing Tools for Biomedical Text Annotation with Multiple Ontologies , 2012, ICBO.

[4]  David W. Embley,et al.  Automatic Creation and Simplified Querying of Semantic Web Content: An Approach Based on Information-Extraction Ontologies , 2006, ASWC.

[5]  Christopher G. Chute,et al.  BioPortal: ontologies and integrated data resources at the click of a mouse , 2009, Nucleic Acids Res..

[6]  Anne E. Thessen,et al.  Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life , 2014, PloS one.

[7]  Atanas Kiryakov,et al.  Towards Semantic Web Information Extraction , 2003 .

[8]  Luigi Iannone,et al.  Ontology module extraction for ontology reuse: an ontology engineering perspective , 2007, CIKM '07.

[9]  Russ B. Altman,et al.  Author ' s personal copy Using text to build semantic networks for pharmacogenomics , 2010 .

[10]  Marja-Riitta Koivunen,et al.  Annotea: an open RDF infrastructure for shared Web annotations , 2001, WWW '01.

[11]  Stefano Spaccapietra,et al.  An Overview of Modularity , 2009, Modular Ontologies.

[12]  Kalina Bontcheva,et al.  GATE: an Architecture for Development of Robust HLT applications , 2002, ACL.

[13]  Rose Dieng,et al.  An Ontology-based Approach to Support Text Mining and Information Retrieval in the Biological Domain , 2007, J. Univers. Comput. Sci..

[14]  Ian Horrocks,et al.  Extracting Modules from Ontologies: A Logic-based Approach , 2009, OWLED.

[15]  Heiner Stuckenschmidt,et al.  Structure-Based Partitioning of Large Ontologies , 2009, Modular Ontologies.

[16]  Pinar Wennerberg,et al.  Ontology modularization to improve semantic medical image annotation , 2011, J. Biomed. Informatics.

[17]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[18]  Mark A. Musen,et al.  Specifying Ontology Views by Traversal , 2004, International Semantic Web Conference.

[19]  Enrico Motta,et al.  Modularization: a Key for the Dynamic Selection of Relevant Knowledge Components , 2006, WoMO.

[20]  Anni Coden,et al.  The ConceptMapper Approach to Named Entity Recognition , 2010, LREC.

[21]  Michel C. A. Klein,et al.  Structure-Based Partitioning of Large Concept Hierarchies , 2004, SEMWEB.

[22]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993 .

[23]  G Stix,et al.  The mice that warred. , 2001, Scientific American.

[24]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[25]  Julian Seidenberg Web Ontology Segmentation: Extraction, Transformation, Evaluation , 2009, Modular Ontologies.

[26]  Christoph Steinbeck,et al.  ChEBI: a chemistry ontology and database , 2010, J. Cheminformatics.

[27]  Maria Cláudia Reis Cavalcanti,et al.  Combining Ontology Modules for Scientific Text Annotation , 2014, J. Inf. Data Manag..

[28]  Bijan Parsia,et al.  Automatic Partitioning of OWL Ontologies Using E-Connections , 2005, Description Logics.

[29]  Mark A. Musen,et al.  The NCBO Annotator: Ontology-Based Annotation as a Web Service , 2011, ICBO.

[30]  Wendy Hall,et al.  The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[31]  Mark A. Musen,et al.  NCBO Resource Index: Ontology-based search and mining of biomedical resources , 2010, J. Web Semant..

[32]  Kalina Bontcheva,et al.  CA manager framework: creating customised workflows for ontology population and semantic annotation , 2009, K-CAP '09.

[33]  Karen Eilbeck,et al.  Evolution of the Sequence Ontology terms and relationships , 2009, J. Biomed. Informatics.

[34]  Thomas R. Gruber,et al.  A Translation Approach to Portable Ontologies , 1993 .

[35]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[36]  Maria Cláudia Reis Cavalcanti,et al.  Applying Graph Partitioning Techniques to Modularize Large Ontologies , 2012, ONTOBRAS-MOST.

[37]  Ana Maria de Carvalho Moura,et al.  An Ontology-Based Reasoning Approach for Document Annotation , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.