Knowledge Representation and Management: a Linked Data Perspective

INTRODUCTION Biomedical research is increasingly becoming a data-intensive science in several areas, where prodigious amounts of data is being generated that has to be stored, integrated, shared and analyzed. In an effort to improve the accessibility of data and knowledge, the Linked Data initiative proposed a well-defined set of recommendations for exposing, sharing and integrating data, information and knowledge, using semantic web technologies. OBJECTIVE The main goal of this paper is to identify the current status and future trends of knowledge representation and management in Life and Health Sciences, mostly with regard to linked data technologies. METHODS We selected three prominent linked data studies, namely Bio2RDF, Open PHACTS and EBI RDF platform, and selected 14 studies published after 2014 (inclusive) that cited any of the three studies. We manually analyzed these 14 papers in relation to how they use linked data techniques. RESULTS The analyses show a tendency to use linked data techniques in Life and Health Sciences, and even if some studies do not follow all of the recommendations, many of them already represent and manage their knowledge using RDF and biomedical ontologies. CONCLUSION These insights from RDF and biomedical ontologies are having a strong impact on how knowledge is generated from biomedical data, by making data elements increasingly connected and by providing a better description of their semantics. As health institutes become more data centric, we believe that the adoption of linked data techniques will continue to grow and be an effective solution to knowledge representation and management.

[1]  G. Galilei Sidereus nuncius, or, The Sidereal messenger , 1989 .

[2]  Andrew M. Jenkinson,et al.  The EBI RDF platform: linked open data for the life sciences , 2014, Bioinform..

[3]  Paul N. Schofield,et al.  Aber-OWL: a framework for ontology-based data access in biology , 2014, BMC Bioinformatics.

[4]  George Papadatos,et al.  ChEMBL web services: streamlining access to drug discovery data and utilities , 2015, Nucleic Acids Res..

[5]  Atsuko Yamaguchi,et al.  TogoTable: cross-database annotation system using the Resource Description Framework (RDF) data model , 2014, Nucleic Acids Res..

[6]  Erik Schultes,et al.  Nanopublications for exposing experimental data in the life-sciences: a Huntington’s Disease case study , 2015, Journal of Biomedical Semantics.

[7]  Carole A. Goble,et al.  Structuring research methods and data with the research object model: genomics workflows as a case study , 2013, Journal of Biomedical Semantics.

[8]  Oliver Horlacher,et al.  Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search , 2015, PloS one.

[9]  José Francisco Aldana Montes,et al.  kpath: integration of metabolic pathway linked data , 2015, Database J. Biol. Databases Curation.

[10]  Francisco M. Couto,et al.  Enrichment analysis applied to disease prognosis , 2013, J. Biomed. Semant..

[11]  Beat Ernst,et al.  Drug discovery today. , 2003, Current topics in medicinal chemistry.

[12]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[13]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[14]  Michael Balls,et al.  Book Review: Models for Biomedical Research — A New Perspective , 1985 .

[15]  N. D. de Keizer,et al.  Understanding Terminological Systems I: Terminology and Typology , 2000, Methods of Information in Medicine.

[16]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[17]  Amit P. Sheth,et al.  Semantic Services, Interoperability and Web Applications - Emerging Concepts , 2011, Semantic Services, Interoperability and Web Applications.

[18]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[19]  Alexandre M. J. J. Bonvin,et al.  Future opportunities and trends for e-infrastructures and life sciences: going beyond the grid to enable life science data analysis , 2015, Front. Genet..

[20]  Suresh Manandhar,et al.  SemEval-2014 Task 7: Analysis of Clinical Text , 2014, *SEMEVAL.

[21]  J. Ioannidis,et al.  Public Availability of Published Research Data in High-Impact Journals , 2011, PloS one.

[22]  Mike Uschold,et al.  A Framework for Understanding and Classifying Ontology Applications , 1999 .

[23]  Tim Clark,et al.  Semantic Web repositories for genomics data using the eXframe platform , 2014, Journal of Biomedical Semantics.

[24]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[25]  Paul T. Groth,et al.  Ten Simple Rules for the Care and Feeding of Scientific Data , 2014, PLoS Comput. Biol..

[26]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[27]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..

[28]  Helena Sofia Pinto,et al.  The Next Generation of Similarity Measures that Fully Explore the Semantics in Biomedical Ontologies , 2013, J. Bioinform. Comput. Biol..

[29]  Laurent Lefort,et al.  Semantic enrichment of longitudinal clinical study data using the CDISC standards and the semantic statistics vocabularies , 2015, Journal of Biomedical Semantics.

[30]  Nigel Collier,et al.  Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora , 2015, Database J. Biol. Databases Curation.

[31]  Peter Woollard,et al.  A case study: semantic integration of gene-disease associations for type 2 diabetes mellitus from literature and biomedical data resources. , 2014, Drug discovery today.

[32]  W. Chapman,et al.  SemEval-2014 Task 7: Analysis of Clinical Text , 2014, *SEMEVAL.

[33]  Daniela Paolotti,et al.  On the usefulness of ontologies in epidemiology research and practice , 2012, Journal of Epidemiology & Community Health.

[34]  Barend Mons,et al.  Open PHACTS: semantic interoperability for drug discovery. , 2012, Drug discovery today.

[35]  Ferran Sanz,et al.  The eTOX Data-Sharing Project to Advance in Silico Drug-Induced Toxicity Prediction , 2014, International journal of molecular sciences.

[36]  W. M. Lindsay,et al.  The Editing of Isidore Etymologiae , 1911, The Classical Quarterly.

[37]  Adrien Coulet,et al.  Mining Linked Open Data: A Case Study with Genes Responsible for Intellectual Disability , 2014, DILS.

[38]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[39]  Heiko Paulheim,et al.  Adoption of the Linked Data Best Practices in Different Topical Domains , 2014, SEMWEB.

[40]  Carole A. Goble,et al.  SEEK: a systems biology data and model management platform , 2015, BMC Systems Biology.

[41]  Philip V. Toukach,et al.  GlycoRDF: an ontology to standardize glycomics data in RDF , 2015, Bioinform..

[42]  Marcia Lei Zeng,et al.  Recent applications of Knowledge Organization Systems: introduction to a special issue , 2015, International Journal on Digital Libraries.

[44]  J. Graunt,et al.  Natural and political observations made upon the bills of mortality , 1939 .

[45]  Peter N. Robinson,et al.  The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease , 2015, American journal of human genetics.

[46]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[47]  Francisco M. Couto,et al.  Rating, Recognizing and Rewarding Metadata Integration and Sharing on the Semantic Web , 2014, URSW.

[48]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[49]  Gang Fu,et al.  Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data , 2014, Nucleic Acids Res..

[50]  Michael Uschold,et al.  Ontologies: principles, methods and applications , 1996, The Knowledge Engineering Review.