IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads

The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2–3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silico methodologies need to be improved both to select better lead compounds, so as to improve the efficiency of later stages in the drug discovery protocol, and to identify those lead compounds more quickly. No known methodological approach can deliver this combination of higher quality and speed. Here, we describe an Integrated Modeling PipEline for COVID Cure by Assessing Better LEads (IMPECCABLE) that employs multiple methodological innovations to overcome this fundamental limitation. We also describe the computational framework that we have developed to support these innovations at scale, and characterize the performance of this framework in terms of throughput, peak performance, and scientific results. We show that individual workflow components deliver 100 × to 1000 × improvement over traditional methods, and that the integration of methods, supported by scalable infrastructure, speeds up drug discovery by orders of magnitudes. IMPECCABLE has screened ∼ 1011 ligands and has been used to discover a promising drug candidate. These capabilities have been used by the US DOE National Virtual Biotechnology Laboratory and the EU Centre of Excellence in Computational Biomedicine.

[1]  Shantenu Jha,et al.  Design and Performance Characterization of RADICAL-Pilot on Leadership-Class Platforms , 2021, IEEE Transactions on Parallel and Distributed Systems.

[2]  Rick Stevens,et al.  Protein-Ligand Docking Surrogate Models: A SARS-CoV-2 Benchmark for Deep Learning Accelerated Virtual Screening , 2021, ArXiv.

[3]  T. Munson,et al.  Achieving 100X faster simulations of complex biological phenomena by coupling ML to HPC ensembles , 2021, ArXiv.

[4]  Rick L. Stevens,et al.  High-Throughput Virtual Screening and Validation of a SARS-CoV-2 Main Protease Noncovalent Inhibitor , 2021, bioRxiv.

[5]  Lei Huang,et al.  AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics , 2020, bioRxiv.

[6]  Rick Stevens,et al.  Scalable HPC and AI Infrastructure for COVID-19 Therapeutics , 2020, ArXiv.

[7]  Peter V. Coveney,et al.  Rapid, accurate, precise and reproducible ligand–protein binding free energy prediction , 2020, Interface Focus.

[8]  Ruth Nussinov,et al.  Artificial intelligence in COVID-19 drug repurposing , 2020, The Lancet Digital Health.

[9]  Charlene Yang,et al.  Hierarchical Roofline Analysis: How to Collect Data using Performance Tools on Intel CPUs and NVIDIA GPUs , 2020, ArXiv.

[10]  Y. Guan,et al.  COVID‐19 drug repurposing: A review of computational screening methods, clinical trials, and protein interaction assays , 2020, Medicinal research reviews.

[11]  Duncan Poole,et al.  GPU-Accelerated Drug Discovery with Docking on the Summit Supercomputer: Porting, Optimization, and Application to COVID-19 Research , 2020, BCB.

[12]  Rick L. Stevens,et al.  Regression Enrichment Surfaces: a Simple Analysis Technique for Virtual Drug Screening Models , 2020, ArXiv.

[13]  Ian Foster,et al.  Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release , 2020, ArXiv.

[14]  Jeremy C. Smith,et al.  How to Discover Antiviral Drugs Quickly. , 2020, The New England journal of medicine.

[15]  Atanu Saha,et al.  Pharmaceutical industry’s changing market dynamics , 2020 .

[16]  Fouad S. Husseini,et al.  Application of the ESMACS Binding Free Energy Protocol to a Multi‐Binding Site Lactate Dehydogenase A Ligand Dataset , 2019, Advanced theory and simulations.

[17]  Bharath Ramsundar,et al.  AMPL: A Data-Driven Modeling Pipeline for Drug Discovery , 2019, J. Chem. Inf. Model..

[18]  Piotr Klukowski,et al.  Adversarial autoencoders for compact representations of 3D point clouds , 2018, Comput. Vis. Image Underst..

[19]  Matteo Turilli,et al.  DeepDriveMD: Deep-Learning Driven Adaptive Molecular Simulations for Protein Folding , 2019, 2019 IEEE/ACM Third Workshop on Deep Learning on Supercomputers (DLS).

[20]  Matteo Turilli,et al.  Characterizing the Performance of Executing Many-tasks on Summit , 2019, 2019 IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM).

[21]  Ian Foster,et al.  Parsl: Pervasive Parallel Programming in Python , 2019, HPDC.

[22]  Shantenu Jha,et al.  Middleware Building Blocks for Workflow Systems , 2019, Computing in Science & Engineering.

[23]  Zois Boukouvalas,et al.  Deep learning for molecular generation and optimization - a review of the state of the art , 2019, Molecular Systems Design & Engineering.

[24]  Arvind Ramanathan,et al.  Mechanism of glucocerebrosidase activation and dysfunction in Gaucher disease unraveled by molecular dynamics and deep learning , 2019, Proceedings of the National Academy of Sciences.

[25]  P. Coveney,et al.  Application of ESMACS binding free energy protocols to diverse datasets: Bromodomain-containing protein 4 , 2018, Scientific Reports.

[26]  Fangfang Xia,et al.  Predicting tumor cell line response to drug pairs with deep learning , 2018, BMC Bioinformatics.

[27]  Shang Gao,et al.  Deep clustering of protein folding simulations , 2018, BMC Bioinformatics.

[28]  Fangfang Xia,et al.  CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research , 2018, BMC Bioinformatics.

[29]  Olexandr Isayev,et al.  Transforming Computational Drug Discovery with Machine Learning and AI. , 2018, ACS medicinal chemistry letters.

[30]  L. Dardenne,et al.  Empirical Scoring Functions for Structure-Based Virtual Screening: Applications, Critical Aspects, and Challenges , 2018, Front. Pharmacol..

[31]  Matteo Turilli,et al.  Enabling Trade-offs Between Accuracy and Computational Cost: Adaptive Algorithms to Reduce Time to Clinical Insight , 2018, 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[32]  Matteo Turilli,et al.  Rapid, concurrent and adaptive extreme scale binding free energy calculation , 2018, ArXiv.

[33]  Shantenu Jha,et al.  Concurrent and Adaptive Extreme Scale Binding Free Energy Calculations , 2018, 2018 IEEE 14th International Conference on e-Science (e-Science).

[34]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[35]  Matteo Turilli,et al.  Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[36]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[37]  Shantenu Jha,et al.  Using Pilot Systems to Execute Many Task Workloads on Supercomputers , 2015, JSSPP.

[38]  Shantenu Jha,et al.  A Comprehensive Perspective on Pilot-Job Systems , 2015, ACM Comput. Surv..

[39]  Abhinav Vishnu,et al.  Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models , 2017, ArXiv.

[40]  Peter V. Coveney,et al.  Evaluation and Characterization of Trk Kinase Inhibitors for the Treatment of Pain: Reliable Binding Affinity Predictions from Theory and Computation , 2017, J. Chem. Inf. Model..

[41]  Peter V Coveney,et al.  Rapid and Reliable Binding Affinity Prediction of Bromodomain Inhibitors: A Computational Study , 2016, Journal of chemical theory and computation.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[44]  L. Kavraki,et al.  Understanding the challenges of protein flexibility in drug design , 2015, Expert opinion on drug discovery.

[45]  Stefano Alcaro,et al.  A Pipeline To Enhance Ligand Virtual Screening: Integrating Molecular Dynamics and Fingerprints for Ligand and Proteins , 2015, J. Chem. Inf. Model..

[46]  Alán Aspuru-Guzik,et al.  What Is High-Throughput Virtual Screening? A Perspective from Organic Materials Discovery , 2015 .

[47]  Charlotte M. Deane,et al.  Rapid, Precise, and Reproducible Prediction of Peptide-MHC Binding Affinities from Molecular Dynamics That Correlate Well with Experiment. , 2015, Journal of chemical theory and computation.

[48]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[49]  Vijay S. Pande,et al.  Massively Multitask Networks for Drug Discovery , 2015, ArXiv.

[50]  Kam Y. J. Zhang,et al.  Hierarchical virtual screening approaches in small molecule drug discovery , 2014, Methods.

[51]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[52]  Tom L. Blundell,et al.  Does a More Precise Chemical Description of Protein–Ligand Complexes Lead to More Accurate Prediction of Binding Affinity? , 2014, J. Chem. Inf. Model..

[53]  Shantenu Jha,et al.  Computing Clinically Relevant Binding Free Energies of HIV-1 Protease Inhibitors , 2014, Journal of chemical theory and computation.

[54]  Xiaohua Zhang,et al.  Toward Fully Automated High Performance Computing Drug Discovery: A Massively Parallel Virtual Screening Pipeline for Docking and Molecular Mechanics/Generalized Born Surface Area Rescoring to Improve Enrichment , 2014, J. Chem. Inf. Model..

[55]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[56]  S. Rees,et al.  Principles of early drug discovery , 2011, British journal of pharmacology.

[57]  Gisbert Schneider,et al.  Virtual screening: an endless staircase? , 2010, Nature Reviews Drug Discovery.

[58]  Peter V. Coveney,et al.  Automated Molecular Simulation Based Binding Affinity Calculator for Ligand-Bound HIV-1 Proteases , 2008, J. Chem. Inf. Model..

[59]  Christopher R. Corbeil,et al.  Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go , 2008, British journal of pharmacology.

[60]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[61]  J. Åqvist,et al.  Ligand binding affinities from MD simulations. , 2002, Accounts of chemical research.

[62]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[63]  J. Broach,et al.  High-throughput screening for drug discovery. , 1996, Nature.

[64]  W. Guida,et al.  The art and practice of structure‐based drug design: A molecular modeling perspective , 1996, Medicinal research reviews.

[65]  T. Straatsma,et al.  Free energy of hydrophobic hydration: A molecular dynamics study of noble gases in water , 1986 .

[66]  R. Zwanzig High‐Temperature Equation of State by a Perturbation Method. I. Nonpolar Gases , 1954 .