Interoperable chemical structure search service

MotivationThe existing connections between large databases of chemicals, proteins, metabolites and assays offer valuable resources for research in fields ranging from drug design to metabolomics. Transparent search across multiple databases provides a way to efficiently utilize these resources. To simplify such searches, many databases have adopted semantic technologies that allow interoperable querying of the datasets using SPARQL query language. However, the interoperable interfaces of the chemical databases still lack the functionality of structure-driven chemical search, which is a fundamental method of data discovery in the chemical search space.ResultsWe present a SPARQL service that augments existing semantic services by making interoperable substructure and similarity searches in small-molecule databases possible. The service thus offers new possibilities for querying interoperable databases, and simplifies writing of heterogeneous queries that include chemical-structure search terms.AvailabilityThe service is freely available and accessible using a standard SPARQL endpoint interface. The service documentation and user-oriented demonstration interfaces that allow quick explorative querying of datasets are available at https://idsm.elixir-czech.cz.

[1]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[2]  Ralph Kühne,et al.  Tautomer Identification and Tautomer Structure Generation Based on the InChI Code , 2010, J. Chem. Inf. Model..

[3]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[4]  C. Tappert,et al.  A Survey of Binary Similarity and Distance Measures , 2010 .

[5]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[6]  Christoph Steinbeck,et al.  ChEBI in 2016: Improved services and an expanding collection of metabolites , 2015, Nucleic Acids Res..

[7]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[8]  Andreas Abecker,et al.  Ontologies and the Semantic Web , 2011, Handbook of Semantic Web Technologies.

[9]  Alexander Tropsha,et al.  Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research , 2010, J. Chem. Inf. Model..

[10]  G Stix,et al.  The mice that warred. , 2001, Scientific American.

[11]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[12]  Wendy A. Warr,et al.  Tautomerism in chemical information management systems , 2010, J. Comput. Aided Mol. Des..

[13]  Jakub Galgonek,et al.  Advanced SPARQL querying in small molecule databases , 2016, Journal of Cheminformatics.

[14]  Yves Raimond,et al.  RDF 1.1 Primer , 2014 .

[15]  Miroslav Kratochvíl,et al.  Sachem: a chemical cartridge for high-performance substructure search , 2018, Journal of Cheminformatics.

[16]  Brian McBride,et al.  The Resource Description Framework (RDF) and its Vocabulary Description Language RDFS , 2004, Handbook on Ontologies.

[17]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[18]  Óscar Corcho,et al.  Semantics and Optimization of the SPARQL 1.1 Federation Extension , 2011, ESWC.

[19]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[20]  Dean Allemang,et al.  Semantic Web for the Working Ontologist - Effective Modeling in RDFS and OWL, Second Edition , 2011 .