A ligand-based computational drug repurposing pipeline using KNIME and Programmatic Data Access: case studies for rare diseases and COVID-19

Biomedical information mining is increasingly recognized as a promising technique to accelerate drug discovery and development. Especially, integrative approaches which mine data from several (open) data sources have become more attractive with the increasing possibilities to programmatically access data through Application Programming Interfaces (APIs). The use of open data in conjunction with free, platform-independent analytic tools provides the additional advantage of flexibility, re-usability, and transparency. Here, we present a strategy for performing ligand-based in silico drug repurposing with the analytics platform KNIME. We demonstrate the usefulness of the developed workflow on the basis of two different use cases: a rare disease (here: Glucose Transporter Type 1 (GLUT-1) deficiency), and a new disease (here: COVID 19). The workflow includes a targeted download of data through web services, data curation, detection of enriched structural patterns, as well as substructure searches in DrugBank and a recently deposited data set of antiviral drugs provided by Chemical Abstracts Service. Developed workflows, tutorials with detailed step-by-step instructions, and the information gained by the analysis of data for GLUT-1 deficiency syndrome and COVID-19 are made freely available to the scientific community. The provided framework can be reused by researchers for other in silico drug repurposing projects, and it should serve as a valuable teaching resource for conveying integrative data mining strategies.

[1]  By Michael Marron-stearns 2019 Update : What to , 2019 .

[2]  Shijia Zhu,et al.  Use of big data in drug development for precision medicine: an update , 2019, Expert review of precision medicine and drug development.

[3]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[4]  Andrew R. Leach,et al.  ChEMBL: towards direct deposition of bioassay data , 2018, Nucleic Acids Res..

[5]  I. S. Wood,et al.  Glucose transporters (GLUT and SGLT): expanded families of sugar transport proteins , 2003, British Journal of Nutrition.

[6]  Zhe-shan Quan,et al.  Synthesis and anticonvulsant activity of 7-alkoxyl-4,5-dihydro-[1,2,4]triazolo[4,3-a]quinolines. , 2005, Bioorganic & medicinal chemistry letters.

[7]  Thorsten Meinl,et al.  KNIME-CDK: Workflow-driven cheminformatics , 2013, BMC Bioinformatics.

[8]  Tao Jiang,et al.  A maximum common substructure-based algorithm for searching and predicting drug-like compounds , 2008, ISMB.

[9]  Chao Wu,et al.  Computational drug repositioning through heterogeneous network clustering , 2013, BMC Systems Biology.

[10]  Xiaoyan Zhu,et al.  Building Disease-Specific Drug-Protein Connectivity Maps from Molecular Interaction Networks and PubMed Abstracts , 2009, PLoS Comput. Biol..

[11]  S. Bari,et al.  Quinazolines: New Horizons in Anticonvulsant Therapy , 2014 .

[12]  Feng-Xu Wu,et al.  ACID: a free tool for drug repurposing using consensus inverse docking strategy , 2019, Journal of Cheminformatics.

[13]  Thorsten Meinl,et al.  KNIME - the Konstanz information miner: version 2.0 and beyond , 2009, SKDD.

[14]  P. Sanseau,et al.  Drug repurposing: progress, challenges and recommendations , 2018, Nature Reviews Drug Discovery.

[15]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[16]  Marc A. Martí-Renom,et al.  MODBASE: a database of annotated comparative protein structure models and associated resources , 2005, Nucleic Acids Res..

[17]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[18]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[19]  David W. Morton,et al.  Data Mining in Drug Discovery and Design , 2016 .

[20]  Jürgen Bajorath,et al.  Compound Data Mining for Drug Discovery. , 2017, Methods in molecular biology.

[21]  Emilio Benfenati,et al.  A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications , 2018, Journal of Cheminformatics.

[22]  Alexey Savelyev,et al.  Indigo: universal cheminformatics API , 2011, J. Cheminformatics.

[23]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[24]  Jonathan M. Keith,et al.  Bioinformatics: Volume II: Structure, Function and Applications , 2017 .

[25]  Molecular interaction and inhibition of SARS-CoV-2 binding to the ACE2 receptor , 2020, Nature communications.

[26]  Carol J. Bult,et al.  The Mouseion at the JAXlibrary , 2022 .

[27]  Michael K. Gilson,et al.  BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology , 2015, Nucleic Acids Res..

[28]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[29]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[30]  Christian J. A. Sigrist,et al.  A potential role for integrins in host cell entry by SARS-CoV-2 , 2020, Antiviral Research.

[31]  Kara Dolinski,et al.  The BioGRID interaction database: 2019 update , 2018, Nucleic Acids Res..

[32]  Evan Bolton,et al.  PubChem 2019 update: improved access to chemical data , 2018, Nucleic Acids Res..

[33]  D. Liu,et al.  Proteolytic processing of polyproteins 1a and 1ab between non-structural proteins 10 and 11/12 of Coronavirus infectious bronchitis virus is dispensable for viral replication in cultured cells , 2008, Virology.

[34]  Hong-Guang Jin,et al.  Anticonvulsant and toxicity evaluation of some 7-alkoxy-4,5-dihydro-[1,2,4]triazolo[4,3-a]quinoline-1(2H)-ones. , 2006, Bioorganic & medicinal chemistry.

[35]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[36]  Ola Spjuth,et al.  Evaluating parameters for ligand-based modeling with random forest on sparse data sets , 2018, Journal of Cheminformatics.

[37]  Chenghua Shao,et al.  RCSB Protein Data Bank: Enabling biomedical research and drug discovery , 2019, Protein science : a publication of the Protein Society.

[38]  Stephen Roughley,et al.  Five Years of the KNIME Vernalis Cheminformatics Community Contribution , 2020, Current medicinal chemistry.

[39]  D. Scherman,et al.  Drug repurposing in rare diseases: Myths and reality. , 2020, Therapie.

[40]  Jon G. Rokne,et al.  A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions , 2020, Journal of Cheminformatics.

[41]  David S. Goodsell,et al.  RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy , 2018, Nucleic Acids Res..

[42]  Joanna L. Sharman,et al.  The IUPHAR/BPS Guide to PHARMACOLOGY: an expert-driven knowledgebase of drug targets and their ligands , 2013, Nucleic Acids Res..

[43]  Cheng-xi Wei,et al.  Current Research on Antiepileptic Compounds , 2015, Molecules.

[44]  Steven L. Dixon,et al.  Use of Robust Classification Techniques for the Prediction of Human Cytochrome P450 2D6 Inhibition , 2003, J. Chem. Inf. Comput. Sci..

[45]  Torsten Schwede,et al.  SWISS-MODEL: homology modelling of protein structures and complexes , 2019 .

[46]  R. Hilgenfeld,et al.  Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors , 2020, Science.

[47]  F. Heinen,et al.  Introduction of a ketogenic diet in young infants , 2002, Journal of Inherited Metabolic Disease.

[48]  Benjamin J. Polacco,et al.  A SARS-CoV-2 Protein Interaction Map Reveals Targets for Drug-Repurposing , 2020, Nature.

[49]  Y. Martin,et al.  Do structurally similar molecules have similar biological activity? , 2002, Journal of medicinal chemistry.

[50]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[51]  Zhe-shan Quan,et al.  Synthesis and anticonvulsant activity of 1-substituted-7-benzyloxy-4,5-dihydro-[1,2,4]triazolo[4,3-a]quinoline. , 2005, Biological & pharmaceutical bulletin.

[52]  J. Rossjohn,et al.  Crystal Structure of the SARS-CoV-2 Non-structural Protein 9, Nsp9 , 2020, bioRxiv.

[53]  Krister Wennerberg,et al.  Interactive visual analysis of drug-target interaction networks using Drug Target Profiler, with applications to precision medicine and drug repurposing , 2018, Briefings Bioinform..

[54]  George Papadatos,et al.  ChEMBL web services: streamlining access to drug discovery data and utilities , 2015, Nucleic Acids Res..

[55]  Fabian P. Steinmetz,et al.  Screening Chemicals for Receptor‐Mediated Toxicological and Pharmacological Endpoints: Using Public Data to Build Screening Tools within a KNIME Workflow , 2015, Molecular informatics.

[56]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[57]  Gautier Koscielny,et al.  Open Targets Platform: new developments and updates two years on , 2018, Nucleic Acids Res..

[58]  Wolfgang Sippl,et al.  Computational Drug Repurposing: Current Trends. , 2019, Current medicinal chemistry.

[59]  S. Singh,et al.  Application of Artificial Neural Networks in Modern Drug Discovery Chapter – 6 published in Artificial Neural Network for Drug Design, Delivery and Disposition , 2015 .