Predicting novel drugs for SARS-CoV-2 using machine learning from a >10 million chemical space

Abstract There is an urgent need for the identification of effective therapeutics for COVID-19 and we have developed a machine learning drug discovery pipeline to identify several drug candidates. First, we collect assay data for 65 target human proteins known to interact with the SARS-CoV-2 proteins, including the ACE2 receptor. Next, we train machine learning models to predict inhibitory activity and use them to screen FDA registered chemicals and approved drugs (∼100,000) and ∼14 million purchasable chemicals. We filter predictions according to estimated mammalian toxicity and vapor pressure. Prospective volatile candidates are proposed as novel inhaled therapeutics since the nasal cavity and respiratory tracts are early bottlenecks for infection. We also identify candidates that act across multiple targets as promising for future analyses. We anticipate that this theoretical study can accelerate testing of two categories of therapeutics: repurposed drugs suited for short-term approval, and novel efficacious drugs suitable for a long-term follow up.

[1]  Antony J. Williams,et al.  In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning , 2017, J. Chem. Inf. Model..

[2]  K. C. Santosh,et al.  AI-Driven Tools for Coronavirus Outbreak: Need of Active Learning and Cross-Population Train/Test Models on Multitudinal/Multimodal Data , 2020, Journal of Medical Systems.

[3]  Wu Zhong,et al.  Remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus (2019-nCoV) in vitro , 2020, Cell Research.

[4]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[5]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[6]  Masahiro Yoshida,et al.  SARS-CoV-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes , 2020, Nature Medicine.

[7]  Lu Huang,et al.  Update of TTD: Therapeutic Target Database , 2009, Nucleic Acids Res..

[8]  P. Ravaud,et al.  No evidence of clinical efficacy of hydroxychloroquine in patients hospitalized for COVID-19 infection with oxygen requirement: results of a study using routinely collected data to emulate a target trial , 2020, medRxiv.

[9]  Andrew R. Leach,et al.  ChEMBL: towards direct deposition of bioassay data , 2018, Nucleic Acids Res..

[10]  Max Kuhn,et al.  The caret Package , 2007 .

[11]  Sandra Coecke,et al.  Acutoxbase, an innovative database for in vitro acute toxicity studies. , 2009, Toxicology in vitro : an international journal published in association with BIBRA.

[12]  S. Anzick,et al.  Clinical benefit of remdesivir in rhesus macaques infected with SARS-CoV-2 , 2020, Nature.

[13]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[14]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[15]  Ralph S. Baric,et al.  Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus , 2020, Journal of Virology.

[16]  George C. Fonger,et al.  The National Library of Medicine's (NLM) Hazardous Substances Data Bank (HSDB): background, recent enhancements and future plans. , 2014, Toxicology.

[17]  Ann M Richard,et al.  Distributed structure-searchable toxicity (DSSTox) public database network: a proposal. , 2002, Mutation research.

[18]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[19]  R. Müller,et al.  Bacteria as genetically programmable producers of bioactive natural products , 2020, Nature Reviews Chemistry.

[20]  Zhan Zhang,et al.  Efficacy of hydroxychloroquine in patients with COVID-19: results of a randomized clinical trial , 2020, medRxiv.

[21]  Yan Bai,et al.  Presumed Asymptomatic Carrier Transmission of COVID-19. , 2020, JAMA.

[22]  A. Shamshiri,et al.  Coincidence of COVID-19 epidemic and olfactory dysfunction outbreak in Iran , 2020, medRxiv.

[23]  M. Day Covid-19: four fifths of cases are asymptomatic, China figures indicate , 2020, BMJ.

[24]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[25]  J. Ruiz-Sandoval,et al.  Neurological manifestations of COVID-19. , 2020, Gaceta medica de Mexico.

[26]  X. Chen,et al.  TTD: Therapeutic Target Database , 2002, Nucleic Acids Res..

[27]  Xiaotao Lu,et al.  An orally bioavailable broad-spectrum antiviral inhibits SARS-CoV-2 in human airway epithelial cell cultures and multiple coronaviruses in mice , 2020, Science Translational Medicine.

[28]  D. Raoult,et al.  Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial , 2020, International Journal of Antimicrobial Agents.

[29]  Qiang Zhou,et al.  Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 , 2020, Science.

[30]  Peter G. Schultz,et al.  A Large-scale Drug Repositioning Survey for SARS-CoV-2 Antivirals , 2020, bioRxiv.

[31]  Tsutomu Hashikawa,et al.  The neuroinvasive potential of SARS‐CoV2 may play a role in the respiratory failure of COVID‐19 patients , 2020, Journal of medical virology.

[32]  Benjamin J. Polacco,et al.  A SARS-CoV-2 Protein Interaction Map Reveals Targets for Drug-Repurposing , 2020, Nature.

[33]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[35]  Chonggang Xu,et al.  High Contagiousness and Rapid Spread of Severe Acute Respiratory Syndrome Coronavirus 2 , 2020, Emerging infectious diseases.