Incorporating Commercial and Private Data into an Open Linked Data Platform for Drug Discovery

The Open PHACTS Discovery Platform aims to provide an integrated information space to advance pharmacological research in the area of drug discovery. Effective drug discovery requires comprehensive data coverage, i.e. integrating all available sources of pharmacology data. While many relevant data sources are available on the linked open data cloud, their content needs to be combined with that of commercial datasets and the licensing of these commercial datasets respected when providing access to the data. Additionally, pharmaceutical companies have built up their own extensive private data collections that they require to be included in their pharmacological dataspace. In this paper we discuss the challenges of incorporating private and commercial data into a linked dataspace: focusing on the modelling of these datasets and their interlinking. We also present the graph-based access control mechanism that ensures commercial and private datasets are only available to authorized users.

[1]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[2]  Les Carr,et al.  A Research Agenda for Linked Closed Dataset , 2011, COLD.

[3]  Bin Chen,et al.  The ChEMBL database as linked open data , 2013, Journal of Cheminformatics.

[4]  Egon L. Willighagen,et al.  Emerging practices for mapping and linking life sciences data using RDF - A case series , 2012, J. Web Semant..

[5]  Chris T. A. Evelo,et al.  WikiPathways: building research communities on biological pathways , 2011, Nucleic Acids Res..

[6]  Ubbo Visser,et al.  Formalization, Annotation and Analysis of Diverse Drug and Probe Screening Assay Datasets Using the BioAssay Ontology (BAO) , 2012, PloS one.

[7]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[8]  Sorel Muresan,et al.  Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds , 2009, J. Cheminformatics.

[9]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[10]  Barend Mons,et al.  Open PHACTS: semantic interoperability for drug discovery. , 2012, Drug discovery today.

[11]  John Wilbanks,et al.  Why Open Drug Discovery Needs Four Simple Rules for Licensing Data and Models , 2012, PLoS Comput. Biol..

[12]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[13]  Antje Chang,et al.  BRENDA , the enzyme database : updates and major new developments , 2003 .

[14]  Les Carr,et al.  A Research Agenda for Linked Closed Data , 2011 .

[15]  Barbara Zdrazil,et al.  Scientific competency questions as the basis for semantically enriched open pharmacological space development. , 2013, Drug discovery today.

[16]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[17]  N. Null The IUPAC International Chemical Identifier (InChI) , 2009 .

[18]  Michel Dumontier,et al.  Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data , 2013, ESWC.

[19]  Bin Chen,et al.  PubChem as a Source of Polypharmacology , 2009, J. Chem. Inf. Model..

[20]  The UniProt Consortium,et al.  Update on activities at the Universal Protein Resource (UniProt) in 2013 , 2012, Nucleic Acids Res..

[21]  Michael Hausenblas,et al.  Describing linked datasets with the VoID vocabulary , 2011 .

[22]  Oscar Corcho,et al.  The Semantic Web: Semantics and Big Data , 2013, Lecture Notes in Computer Science.

[23]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[24]  Arthur Dalby,et al.  Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited , 1992, J. Chem. Inf. Comput. Sci..

[25]  Chris T. A. Evelo,et al.  Applying linked data approaches to pharmacology: Architectural decisions and implementation , 2014, Semantic Web.

[26]  Evan Bolton,et al.  An overview of the PubChem BioAssay resource , 2009, Nucleic Acids Res..

[27]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .