Drugsniffer: An Open Source Workflow for Virtually Screening Billions of Molecules for Binding Affinity to Protein Targets

The SARS-CoV2 pandemic has highlighted the importance of efficient and effective methods for identification of therapeutic drugs, and in particular has laid bare the need for methods that allow exploration of the full diversity of synthesizable small molecules. While classical high-throughput screening methods may consider up to millions of molecules, virtual screening methods hold the promise of enabling appraisal of billions of candidate molecules, thus expanding the search space while concurrently reducing costs and speeding discovery. Here, we describe a new screening pipeline, called drugsniffer, that is capable of rapidly exploring drug candidates from a library of billions of molecules, and is designed to support distributed computation on cluster and cloud resources. As an example of performance, our pipeline required ∼40,000 total compute hours to screen for potential drugs targeting three SARS-CoV2 proteins among a library of ∼3.7 billion candidate molecules.

[1]  J. Butterton,et al.  Molnupiravir for Oral Treatment of Covid-19 in Nonhospitalized Patients , 2021, The New England journal of medicine.

[2]  N. Strynadka,et al.  Automated discovery of noncovalent inhibitors of SARS-CoV-2 main protease by consensus Deep Docking of 40 billion small molecules , 2021, Chemical science.

[3]  Elisabeth Mahase Covid-19: Pfizer’s paxlovid is 89% effective in patients at risk of serious illness, company reports , 2021, BMJ.

[4]  Vishwesh Venkatraman,et al.  FP-ADMET: a compendium of fingerprint-based ADMET prediction models , 2021, Journal of Cheminformatics.

[5]  Reed M. Stein,et al.  A practical guide to large-scale docking , 2021, Nature Protocols.

[6]  Natalia S. Adler,et al.  dockECR: Open consensus docking and ranking protocol for virtual screening of small molecules , 2021, Journal of Molecular Graphics and Modelling.

[7]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[8]  A. Lupas,et al.  High‐accuracy protein structure prediction in CASP14 , 2021, Proteins.

[9]  M. Dauchez,et al.  AMIDE v2: High-Throughput Screening Based on AutoDock-GPU and Improved Workflow Leading to Better Performance and Reliability , 2021, International journal of molecular sciences.

[10]  Nathan Brown,et al.  De novo molecular design and generative models. , 2021, Drug discovery today.

[11]  G. Wagner,et al.  VirtualFlow Ants—Ultra-Large Virtual Screenings with Artificial Intelligence Driven Docking Algorithm Based on Ant Colony Optimization , 2021, International journal of molecular sciences.

[12]  A. Milstein,et al.  Influence of a COVID-19 vaccine’s effectiveness and safety profile on vaccination acceptance , 2021, Proceedings of the National Academy of Sciences.

[13]  M. Jit,et al.  Challenges in ensuring global access to COVID-19 vaccines: production, affordability, allocation, and deployment , 2021, The Lancet.

[14]  K. Chibale,et al.  Antiviral drug discovery: preparing for the next pandemic. , 2021, Chemical Society reviews.

[15]  David Ryan Koes,et al.  GNINA 1.0: molecular docking with deep learning , 2021, Journal of Cheminformatics.

[16]  D. Ndwandwe,et al.  COVID-19 vaccines , 2021, Current Opinion in Immunology.

[17]  Wolf-Dietrich Ihlenfeldt,et al.  SAVI, in silico generation of billions of easily synthesizable compounds through expert-system type rules , 2020, Scientific Data.

[18]  Benjamin A. Shoemaker,et al.  PubChem in 2021: new data content and improved web interfaces , 2020, Nucleic Acids Res..

[19]  Conrad C. Huang,et al.  UCSF ChimeraX: Structure visualization for researchers, educators, and developers , 2020, Protein science : a publication of the Protein Society.

[20]  Duncan Poole,et al.  Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19 , 2020, J. Chem. Inf. Model..

[21]  Anup Kumar,et al.  The ChemicalToolbox: reproducible, user-friendly cheminformatics analysis on the Galaxy platform , 2020, Journal of Cheminformatics.

[22]  Artem Cherkasov,et al.  Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery , 2020, ACS central science.

[23]  Eduardo Habib Bechelane Maia,et al.  Structure-Based Virtual Screening: From Classical to Artificial Intelligence , 2020, Frontiers in Chemistry.

[24]  Jacob D. Durrant,et al.  AutoGrow4: an open-source genetic algorithm for de novo drug design and lead optimization , 2020, Journal of Cheminformatics.

[25]  Didier Rognan,et al.  LIT-PCBA: An Unbiased Data Set for Machine Learning and Virtual Screening , 2020, J. Chem. Inf. Model..

[26]  M. Aljofan,et al.  An overview of drug discovery and development. , 2020, Future medicinal chemistry.

[27]  David A. Scott,et al.  An open-source drug discovery platform enables ultra-large virtual screens , 2020, Nature.

[28]  Dimitar Hristozov,et al.  Enhancing reaction-based de novo design using a multi-label reaction class recommender , 2020, Journal of Computer-Aided Molecular Design.

[29]  Le Zhang,et al.  Exploring the computational methods for protein-ligand binding site prediction , 2020, Computational and structural biotechnology journal.

[30]  Tingjun Hou,et al.  Combined strategies in structure-based virtual screening. , 2020, Physical chemistry chemical physics : PCCP.

[31]  Xiaojian Wang,et al.  Machine Learning Models Based on Molecular Fingerprints and an Extreme Gradient Boosting Method Lead to the Discovery of JAK2 Inhibitors , 2019, J. Chem. Inf. Model..

[32]  Ting-Yi Sung,et al.  N-GlyDE: a two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding , 2019, Scientific Reports.

[33]  A. F. Tillack,et al.  Accelerating AutoDock4 with GPUs and Gradient-Based Local Search. , 2019, Journal of chemical theory and computation.

[34]  Yaoqi Zhou,et al.  DLIGAND2: an improved knowledge-based energy function for protein–ligand interactions using the distance-scaled, finite, ideal-gas reference state , 2019, Journal of Cheminformatics.

[35]  Qi Zhao,et al.  Predicting Drug-Induced Liver Injury Using Ensemble Learning Methods and Molecular Fingerprints , 2018, Toxicological sciences : an official journal of the Society of Toxicology.

[36]  Courtney K. Soderberg,et al.  Using OSF to Share Data: A Step-by-Step Guide , 2018 .

[37]  Niki Pavlopoulou,et al.  VSpipe, an Integrated Resource for Virtual Screening and Hit Selection: Applications to Protein Tyrosine Phospahatase Inhibition , 2018, Molecules.

[38]  Connor W. Coley,et al.  SCScore: Synthetic Complexity Learned from a Reaction Corpus , 2018, J. Chem. Inf. Model..

[39]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[40]  Hojung Nam,et al.  Prediction models for drug-induced hepatotoxicity by using weighted molecular fingerprints , 2017, BMC Bioinformatics.

[41]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[42]  Roger A. Sayle,et al.  Comparing structural fingerprints using a literature-based similarity benchmark , 2016, Journal of Cheminformatics.

[43]  Kwong-Sak Leung,et al.  USR-VS: a web server for large-scale prospective virtual screening using ultrafast shape recognition techniques , 2016, Nucleic Acids Res..

[44]  David Ryan Koes,et al.  Pharmit: interactive exploration of chemical space , 2016, Nucleic Acids Res..

[45]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[46]  C. Kwoh,et al.  Fast, accurate, and reliable molecular docking with QuickVina 2 , 2015, Bioinform..

[47]  Piotr Zielenkiewicz,et al.  Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field , 2015, Journal of Cheminformatics.

[48]  Károly Héberger,et al.  Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? , 2015, Journal of Cheminformatics.

[49]  Michal Brylinski,et al.  Calculating an optimal box size for ligand docking and virtual screening against experimental and predicted binding pockets , 2015, Journal of Cheminformatics.

[50]  Dima Kozakov,et al.  The FTMap family of web servers for determining and characterizing ligand-binding hot spots of proteins , 2015, Nature Protocols.

[51]  Pierre Tufféry,et al.  MTiOpenScreen: a web server for structure-based virtual screening , 2015, Nucleic Acids Res..

[52]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[53]  K. Prodanova,et al.  Modeling data for tilted implants in grafted with bio-oss maxillary sinuses using logistic regression , 2014 .

[54]  José Xavier-Neto,et al.  KVFinder: steered identification of protein cavities as a PyMOL plugin , 2014, BMC Bioinformatics.

[55]  Xia Wang,et al.  iDrug: a web-accessible and interactive drug discovery and design platform , 2014, Journal of Cheminformatics.

[56]  Kwong-Sak Leung,et al.  istar: A Web Platform for Large-Scale Protein-Ligand Docking , 2014, PloS one.

[57]  Vijay S. Pande,et al.  SWEETLEAD: an In Silico Database of Approved Drugs, Regulated Chemicals, and Herbal Isolates for Computer-Aided Drug Discovery , 2013, PloS one.

[58]  Malgorzata N. Drwal,et al.  Combination of ligand- and structure-based methods in virtual screening. , 2013, Drug discovery today. Technologies.

[59]  David Ryan Koes,et al.  Lessons Learned in Empirical Scoring with smina from the CSAR 2011 Benchmarking Exercise , 2013, J. Chem. Inf. Model..

[60]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[61]  Jacob D. Durrant,et al.  AutoClickChem: Click Chemistry in Silico , 2012, PLoS Comput. Biol..

[62]  Markus Hartenfeller,et al.  A Collection of Robust Organic Synthesis Reactions for In Silico Molecule Design , 2011, J. Chem. Inf. Model..

[63]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[64]  Gregory L. Wilson,et al.  Integrating structure-based and ligand-based approaches for computational drug design. , 2011, Future medicinal chemistry.

[65]  J. Bajorath,et al.  State-of-the-art in ligand-based virtual screening. , 2011, Drug discovery today.

[66]  Anita R. Maguire,et al.  Confab - Systematic generation of diverse low-energy conformers , 2011, J. Cheminformatics.

[67]  Andreas Zell,et al.  jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints , 2011, J. Cheminformatics.

[68]  Joachim M. Buhmann,et al.  The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.

[69]  Dominique Douguet,et al.  e-LEA3D: a computational-aided drug design web server , 2010, Nucleic Acids Res..

[70]  Christopher P Austin,et al.  Quantitative analyses of aggregation, autofluorescence, and reactivity artifacts in a screen for inhibitors of a thiol protease. , 2010, Journal of medicinal chemistry.

[71]  Michael M. Mysinger,et al.  Automated Docking Screens: A Feasibility Study , 2009, Journal of medicinal chemistry.

[72]  Jianpeng Ma,et al.  CHARMM: The biomolecular simulation program , 2009, J. Comput. Chem..

[73]  Lorenz C. Blum,et al.  970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. , 2009, Journal of the American Chemical Society.

[74]  A. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[75]  Vincent Le Guilloux,et al.  Fpocket: An open source platform for ligand pocket detection , 2009, BMC Bioinformatics.

[76]  Michael C. Wendl,et al.  Argonaute—a database for gene regulation by mammalian microRNAs , 2005, BMC Bioinformatics.

[77]  OUP accepted manuscript , 2021, Bioinformatics.

[78]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[79]  Yong Zhou,et al.  Roll: a new algorithm for the detection of protein pockets and cavities with a rolling probe sphere , 2010, Bioinform..

[80]  L. Breiman Random Forests , 2001, Machine Learning.

[81]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..