The ChEMBL database as linked open data

BackgroundMaking data available as Linked Data using Resource Description Framework (RDF) promotes integration with other web resources. RDF documents can natively link to related data, and others can link back using Uniform Resource Identifiers (URIs). RDF makes the data machine-readable and uses extensible vocabularies for additional information, making it easier to scale up inference and data analysis.ResultsThis paper describes recent developments in an ongoing project converting data from the ChEMBL database into RDF triples. Relative to earlier versions, this updated version of ChEMBL-RDF uses recently introduced ontologies, including CHEMINF and CiTO; exposes more information from the database; and is now available as dereferencable, linked data. To demonstrate these new features, we present novel use cases showing further integration with other web resources, including Bio2RDF, Chem2Bio2RDF, and ChemSpider, and showing the use of standard ontologies for querying.ConclusionsWe have illustrated the advantages of using open standards and ontologies to link the ChEMBL database to other databases. Using those links and the knowledge encoded in standards and ontologies, the ChEMBL-RDF resource creates a foundation for integrated semantic web cheminformatics applications, such as the presented decision support.

[1]  Peter Ansell,et al.  Model and prototype for querying multiple linked scientific datasets , 2011, Future Gener. Comput. Syst..

[2]  María Martín,et al.  The Universal Protein Resource (UniProt) in 2010 , 2010 .

[3]  Bin Chen,et al.  Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data , 2010, BMC Bioinformatics.

[4]  James R. Brown,et al.  Thousands of chemical starting points for antimalarial lead identification , 2010, Nature.

[5]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[6]  Sean Ekins,et al.  A quality alert and call for improved curation of public chemistry databases. , 2011, Drug discovery today.

[7]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[8]  David M. Shotton,et al.  CiTO, the Citation Typing Ontology , 2010, J. Biomed. Semant..

[9]  Wendy A. Warr,et al.  ChEMBL. An interview with John Overington, team leader, chemogenomics at the European Bioinformatics Institute Outstation of the European Molecular Biology Laboratory (EMBL-EBI) , 2009, J. Comput. Aided Mol. Des..

[10]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[11]  Ola Spjuth,et al.  Computational toxicology using the OpenTox application programming interface and Bioclipse , 2011, BMC Research Notes.

[12]  Chris T. A. Evelo,et al.  The BridgeDb framework: standardized access to gene, protein and metabolite identifier mapping services , 2010, BMC Bioinformatics.

[13]  Antony J. Williams,et al.  Beautifying Data in the Real World , 2010 .

[14]  Jian Zhang,et al.  The Protein Ontology: a structured representation of protein forms and complexes , 2010, Nucleic Acids Res..

[15]  Bin Chen,et al.  Improving integrative searching of systems chemical biology data using semantic annotation , 2012, Journal of Cheminformatics.

[16]  Bin Chen,et al.  Assessing Drug Target Association Using Semantic Linked Data , 2012, PLoS Comput. Biol..

[17]  Ubbo Visser,et al.  BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results , 2011, BMC Bioinformatics.

[18]  John P. Overington ChEMBL. An interview with John Overington, team leader, chemogenomics at the European Bioinformatics Institute Outstation of the European Molecular Biology Laboratory (EMBL-EBI). Interview by Wendy A. Warr. , 2009, Journal of computer-aided molecular design.

[19]  Tim Berners-Lee,et al.  Linked Data on the Web , 2008, LDOW.

[20]  Egon L. Willighagen,et al.  Linked open drug data for pharmaceutical research and development , 2011, J. Cheminformatics.

[21]  Baris E. Suzek,et al.  The Universal Protein Resource (UniProt) in 2010 , 2009, Nucleic Acids Res..

[22]  Barend Mons,et al.  Open PHACTS: semantic interoperability for drug discovery. , 2012, Drug discovery today.

[23]  Ola Spjuth,et al.  Integrated Decision Support for Assessing Chemical Liabilities , 2011, J. Chem. Inf. Model..

[24]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .

[25]  Barry Smith,et al.  The Cornucopia of Formal-Ontological Relations , 2005 .

[26]  Egon L. Willighagen,et al.  Emerging practices for mapping and linking life sciences data using RDF - A case series , 2012, J. Web Semant..

[27]  Egon L. Willighagen,et al.  Linking the Resource Description Framework to cheminformatics and proteochemometrics , 2011, J. Biomed. Semant..

[28]  Педагогика Open Knowledge Foundation , 2010 .

[29]  Robert J Levy,et al.  Serotonin transporter mechanisms and cardiac disease. , 2005, Circulation.

[30]  Andra Waagmeester,et al.  Measuring impact in online resources with the CI­number (the CitedIn Number for online impact) , 2011 .

[31]  C. Steinbeck,et al.  The Chemical Information Ontology: Provenance and Disambiguation for Chemical Data on the Biological Semantic Web , 2011, PloS one.

[32]  Roland C Grafström,et al.  Bioinformatics processing of protein and transcript profiles of normal and transformed cell lines indicates functional impairment of transcriptional regulators in buccal carcinoma. , 2007, Journal of proteome research.