Linked Biomedical Dataspace: Lessons Learned Integrating Data for Drug Discovery

The increase in the volume and heterogeneity of biomedical data sources has motivated researchers to embrace Linked Data (LD) technologies to solve the ensuing integration challenges and enhance information discovery. As an integral part of the EU GRANATUM project, a Linked Biomedical Dataspace (LBDS) was developed to semantically interlink data from multiple sources and augment the design of in silico experiments for cancer chemoprevention drug discovery. The different components of the LBDS facilitate both the bioinformaticians and the biomedical researchers to publish, link, query and visually explore the heterogeneous datasets. We have extensively evaluated the usability of the entire platform. In this paper, we showcase three different workflows depicting real-world scenarios on the use of LBDS by the domain users to intuitively retrieve meaningful information from the integrated sources. We report the important lessons that we learned through the challenges encountered and our accumulated experience during the collaborative processes which would make it easier for LD practitioners to create such dataspaces in other domains. We also provide a concise set of generic recommendations to develop LD platforms useful for drug discovery.

[1]  Barend Mons,et al.  Open PHACTS: semantic interoperability for drug discovery. , 2012, Drug discovery today.

[2]  Heiner Stuckenschmidt,et al.  Ontology Alignment Evaluation Initiative: Six Years of Experience , 2011, J. Data Semant..

[3]  Nikos Loutas,et al.  A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources , 2014, Semantic Web.

[4]  Dean Allemang,et al.  The Semantic Web - ISWC 2006, 5th International Semantic Web Conference, ISWC 2006, Athens, GA, USA, November 5-9, 2006, Proceedings , 2006, SEMWEB.

[5]  Enrico Motta,et al.  Overcoming Schema Heterogeneity between Linked Semantic Repositories to Improve Coreference Resolution , 2009, ASWC.

[6]  Fabien Campagne,et al.  Building a protein name dictionary from full text: a machine learning term extraction approach , 2005, BMC Bioinformatics.

[7]  Muhammad Saleem,et al.  Big linked cancer data: Integrating linked TCGA and PubMed , 2014, J. Web Semant..

[8]  Susan Halabi,et al.  A randomized trial of aspirin to prevent colorectal adenomas in patients with previous colorectal cancer. , 2003, The New England journal of medicine.

[9]  Jürgen Umbrich,et al.  SPARQL Web-Querying Infrastructure: Ready for Action? , 2013, SEMWEB.

[10]  Stefan Decker,et al.  Cataloguing and Linking Life Sciences LOD Cloud , 2009 .

[11]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[12]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .

[13]  Stefano Spaccapietra Journal on Data Semantics XV , 2011, Journal on Data Semantics XV.

[14]  Florian Daniel,et al.  Current Trends in Web Engineering , 2010, Lecture Notes in Computer Science.

[15]  Seán O'Riain,et al.  Querying Linked Data Using Semantic Relatedness: A Vocabulary Independent Approach , 2011, NLDB.

[16]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[17]  Stefan Decker,et al.  ReVeaLD: A user-driven domain-specific interactive search platform for biomedical research , 2014, J. Biomed. Informatics.

[18]  David R. Karger,et al.  Fresnel: A Browser-Independent Presentation Vocabulary for RDF , 2005, SEMWEB.

[19]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[20]  S. Bryant,et al.  PubChem as a public resource for drug discovery. , 2010, Drug discovery today.

[21]  J. Irwin,et al.  ZINC ? A Free Database of Commercially Available Compounds for Virtual Screening. , 2005 .

[22]  Ulf Leser,et al.  Selecting Materialized Views for RDF Data , 2010, ICWE Workshops.

[23]  Stefan Decker,et al.  GenomeSnip: Fragmenting the Genomic Wheel to augment discovery in cancer research , 2014 .

[24]  Egon L. Willighagen,et al.  Linked open drug data for pharmaceutical research and development , 2011, J. Cheminformatics.

[25]  Seán O'Riain,et al.  Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches, and Trends , 2012, IEEE Internet Computing.

[26]  Pedro Alexandrino Fernandes,et al.  Protein–ligand docking: Current status and future challenges , 2006, Proteins.

[27]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[28]  M. G. Jones,et al.  The concept map as a research and evaluation tool: Further evidence of validity , 1994 .

[29]  Konstantinos A. Tarabanis,et al.  Linked2Safety: A secure linked data medical information space for semantically-interconnecting EHRs advancing patients' safety in medical research , 2012, 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE).

[30]  Valerie Speirs,et al.  Coexpression of Estrogen Receptor α and β: Poor Prognostic Factors in Human Breast Cancer? , 1999 .

[31]  Abraham Bernstein,et al.  The Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings , 2009, SEMWEB.

[32]  Abraham Bernstein,et al.  Evaluating the usability of natural language query languages and interfaces to Semantic Web knowledge bases , 2010, J. Web Semant..

[33]  Egon L. Willighagen,et al.  Incorporating Commercial and Private Data into an Open Linked Data Platform for Drug Discovery , 2013, SEMWEB.

[34]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[35]  Rafael Berlanga Llavori,et al.  Exploring and linking biomedical resources through multidimensional semantic spaces , 2012, BMC Bioinformatics.

[36]  Adrian Paschke,et al.  A journey to Semantic Web query federation in the life sciences , 2009, BMC Bioinformatics.

[37]  Lora Aroyo,et al.  The Semantic Web – ISWC 2013 , 2013, Lecture Notes in Computer Science.

[38]  Michael Uschold,et al.  Ontologies: principles, methods and applications , 1996, The Knowledge Engineering Review.

[39]  Alan Ruttenberg,et al.  Life sciences on the Semantic Web: the Neurocommons and beyond , 2009, Briefings Bioinform..

[40]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[41]  Encoding Rules,et al.  SMILES, a Chemical Language and Information System. 1. Introduction to Methodology , 1988 .

[42]  Lora Aroyo,et al.  The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I , 2011, SEMWEB.

[43]  Michael Hausenblas,et al.  Describing Linked Datasets , 2009, LDOW.

[44]  David B. Searls,et al.  Data integration: challenges for drug discovery , 2005, Nature Reviews Drug Discovery.

[45]  Trevor J. M. Bench-Capon,et al.  An Analysis of Ontology Mismatches; Heterogeneity versus Interoperability , 2007 .

[46]  Constantinos S. Pattichis,et al.  A workflow system for virtual screening in cancer chemoprevention , 2012, 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE).

[47]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[48]  Muhammad Saleem,et al.  A fine-grained evaluation of SPARQL endpoint federation systems , 2016, Semantic Web.