Scalable HPC and AI Infrastructure for COVID-19 Therapeutics

COVID-19 has claimed more 1 million lives and resulted in over 40 million infections. There is an urgent need to identify drugs that can inhibit SARS-CoV-2. In response, the DOE recently established the Medical Therapeutics project as part of the National Virtual Biotechnology Laboratory, and tasked it with creating the computational infrastructure and methods necessary to advance therapeutics development. We discuss innovations in computational infrastructure and methods that are accelerating and advancing drug design. Specifically, we describe several methods that integrate artificial intelligence and simulation-based approaches, and the design of computational infrastructure to support these methods at scale. We discuss their implementation and characterize their performance, and highlight science advances that these capabilities have enabled.

[1]  Peter V Coveney,et al.  Rapid, Accurate, Precise, and Reliable Relative Free Energy Prediction Using Ensemble Based Thermodynamic Integration. , 2017, Journal of chemical theory and computation.

[2]  Peter V Coveney,et al.  Rapid and Reliable Binding Affinity Prediction of Bromodomain Inhibitors: A Computational Study , 2016, Journal of chemical theory and computation.

[3]  Matteo Turilli,et al.  Characterizing the Performance of Executing Many-tasks on Summit , 2019, 2019 IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM).

[4]  T. Straatsma,et al.  Free energy of hydrophobic hydration: A molecular dynamics study of noble gases in water , 1986 .

[5]  Ian T. Foster,et al.  IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads , 2020, ICPP.

[6]  Martin Schulz,et al.  Flux: A Next-Generation Resource Management Framework for Large HPC Centers , 2014, 2014 43rd International Conference on Parallel Processing Workshops.

[7]  Rick L. Stevens,et al.  Regression Enrichment Surfaces: a Simple Analysis Technique for Virtual Drug Screening Models , 2020, ArXiv.

[8]  Matteo Turilli,et al.  Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[9]  Shantenu Jha,et al.  A Comprehensive Perspective on Pilot-Job Systems , 2015, ACM Comput. Surv..

[10]  Shang Gao,et al.  Deep clustering of protein folding simulations , 2018, BMC Bioinformatics.

[11]  Shantenu Jha,et al.  SAGA: A standardized access layer to heterogeneous Distributed Computing Infrastructure , 2015 .

[12]  W. Guida,et al.  The art and practice of structure‐based drug design: A molecular modeling perspective , 1996, Medicinal research reviews.

[13]  J. A. Grant,et al.  Gaussian docking functions. , 2003, Biopolymers.

[14]  Aurelien Bouteiller,et al.  PMIx: process management for exascale environments , 2017, EuroMPI/USA.

[15]  Shantenu Jha,et al.  Using Pilot Systems to Execute Many Task Workloads on Supercomputers , 2015, JSSPP.

[16]  Geoffrey C. Fox,et al.  Learning Everywhere: A Taxonomy for the Integration of Machine Learning and Simulations , 2019, 2019 15th International Conference on eScience (eScience).

[17]  Matteo Turilli,et al.  DeepDriveMD: Deep-Learning Driven Adaptive Molecular Simulations for Protein Folding , 2019, 2019 IEEE/ACM Third Workshop on Deep Learning on Supercomputers (DLS).

[18]  A. Liwo,et al.  Principal component analysis for protein folding dynamics. , 2009, Journal of molecular biology.