Message passing interface and multithreading hybrid for parallel molecular docking of large databases on petascale high performance computing machines

A mixed parallel scheme that combines message passing interface (MPI) and multithreading was implemented in the AutoDock Vina molecular docking program. The resulting program, named VinaLC, was tested on the petascale high performance computing (HPC) machines at Lawrence Livermore National Laboratory. To exploit the typical cluster‐type supercomputers, thousands of docking calculations were dispatched by the master process to run simultaneously on thousands of slave processes, where each docking calculation takes one slave process on one node, and within the node each docking calculation runs via multithreading on multiple CPU cores and shared memory. Input and output of the program and the data handling within the program were carefully designed to deal with large databases and ultimately achieve HPC on a large number of CPU cores. Parallel performance analysis of the VinaLC program shows that the code scales up to more than 15K CPUs with a very low overhead cost of 3.94%. One million flexible compound docking calculations took only 1.4 h to finish on about 15K CPUs. The docking accuracy of VinaLC has been validated against the DUD data set by the re‐docking of X‐ray ligands and an enrichment study, 64.4% of the top scoring poses have RMSD values under 2.0 Å. The program has been demonstrated to have good enrichment performance on 70% of the targets in the DUD data set. An analysis of the enrichment factors calculated at various percentages of the screening database indicates VinaLC has very good early recovery of actives. © 2013 Wiley Periodicals, Inc.

[1]  M. Jette,et al.  Simple Linux Utility for Resource Management , 2009 .

[2]  Yanli Wang,et al.  Structure-Based Virtual Screening for Drug Discovery: a Problem-Centric Review , 2012, The AAPS Journal.

[3]  J A Swets,et al.  Better decisions through science. , 2000, Scientific American.

[4]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[5]  Sriram Krishnan,et al.  Opal web services for biomedical applications , 2010, Nucleic Acids Res..

[6]  Giuseppe Bifulco,et al.  Chrysophaentins A-H, antibacterial bisdiarylbutene macrocycles that inhibit the bacterial cell division protein FtsZ. , 2010, Journal of the American Chemical Society.

[7]  C. E. Peishoff,et al.  A critical assessment of docking programs and scoring functions. , 2006, Journal of medicinal chemistry.

[8]  J. Irwin,et al.  Benchmarking sets for molecular docking. , 2006, Journal of medicinal chemistry.

[9]  D. Joseph-McCarthy,et al.  Automated generation of MCSS‐derived pharmacophoric DOCK site points for searching multiconformation databases , 2003, Proteins.

[10]  Jeremy C. Smith,et al.  Task‐parallel message passing interface implementation of Autodock4 for docking of very large databases of compounds using high‐performance super‐computers , 2011, J. Comput. Chem..

[11]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[12]  D S Goodsell,et al.  Automated docking of flexible ligands: Applications of autodock , 1996, Journal of molecular recognition : JMR.

[13]  Thomas Stützle,et al.  Accelerating Molecular Docking Calculations Using Graphics Processing Units , 2011, J. Chem. Inf. Model..

[14]  Bert L. de Groot,et al.  Ligand docking and binding site analysis with PyMOL and Autodock/Vina , 2010, J. Comput. Aided Mol. Des..

[15]  Giuseppe Bifulco,et al.  Discovery of sulfated sterols from marine invertebrates as a new class of marine natural antagonists of farnesoid-X-receptor. , 2011, Journal of medicinal chemistry.

[16]  Junmei Wang,et al.  Development and testing of a general amber force field , 2004, J. Comput. Chem..

[17]  Thomas Lengauer,et al.  A fast flexible docking method using an incremental construction algorithm. , 1996, Journal of molecular biology.

[18]  Yongbo Hu,et al.  Comparison of Several Molecular Docking Programs: Pose Prediction and Virtual Screening Accuracy , 2009, J. Chem. Inf. Model..

[19]  Renxiao Wang,et al.  Comparative evaluation of 11 scoring functions for molecular docking. , 2003, Journal of medicinal chemistry.

[20]  Nihar Ranjan,et al.  Aminoglycoside binding to Oxytricha nova telomeric DNA. , 2010, Biochemistry.

[21]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[22]  Ruben Abagyan,et al.  ICM—A new method for protein modeling and design: Applications to docking and structure prediction from the distorted native conformation , 1994, J. Comput. Chem..

[23]  W. L. Jorgensen The Many Roles of Computation in Drug Discovery , 2004, Science.

[24]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[25]  Hege S. Beard,et al.  Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. , 2004, Journal of medicinal chemistry.

[26]  Tudor I. Oprea,et al.  Optimization of CAMD techniques 3. Virtual screening enrichment studies: a help or hindrance in tool selection? , 2008, J. Comput. Aided Mol. Des..

[27]  J. Bajorath,et al.  Docking and scoring in virtual screening for drug discovery: methods and applications , 2004, Nature Reviews Drug Discovery.

[28]  I. Kuntz,et al.  DOCK 6: combining techniques to model RNA-small molecule complexes. , 2009, RNA.

[29]  Bo Li,et al.  GPU Acceleration of Dock6’s Amber Scoring Computation , 2010, Advances in experimental medicine and biology.

[30]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[31]  Bing Wang,et al.  The role of quantum mechanics in structure-based drug design. , 2007, Drug discovery today.

[32]  A. Hopkins,et al.  The druggable genome , 2002, Nature Reviews Drug Discovery.

[33]  Todd J. A. Ewing,et al.  DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases , 2001, J. Comput. Aided Mol. Des..

[34]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[35]  Vibhav Garg,et al.  Cloud computing approaches to accelerate drug discovery value chain. , 2011, Combinatorial chemistry & high throughput screening.

[36]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[37]  Ajay N. Jain Surflex-Dock 2.1: Robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search , 2007, J. Comput. Aided Mol. Des..

[38]  Ajay N. Jain Morphological similarity: A 3D molecular similarity method correlated with protein-ligand recognition , 2000, J. Comput. Aided Mol. Des..

[39]  Didier Rognan,et al.  Ranking Targets in Structure-Based Virtual Screening of Three-Dimensional Protein Libraries: Methods and Problems , 2008, J. Chem. Inf. Model..

[40]  I. Kuntz,et al.  Automated docking with grid‐based energy evaluation , 1992 .

[41]  Ajay N. Jain Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. , 2003, Journal of medicinal chemistry.

[42]  Ruth Nussinov,et al.  Principles of docking: An overview of search algorithms and a guide to scoring functions , 2002, Proteins.

[43]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998 .

[44]  M F Sanner,et al.  Python: a programming language for software integration and development. , 1999, Journal of molecular graphics & modelling.

[45]  P. Charifson,et al.  Improved scoring of ligand-protein interactions using OWFEG free energy grids. , 2001, Journal of medicinal chemistry.

[46]  Thomas Sander,et al.  Comparison of Ligand- and Structure-Based Virtual Screening on the DUD Data Set , 2009, J. Chem. Inf. Model..