Exploring Large Scale Receptor-Ligand Pairs in Molecular Docking Workflows in HPC Clouds

Computer-aided drug design techniques are important assets in pharmaceutical industry because of their support for research and development of new drugs. Molecular docking (MD) predicts specific compound's binding modes within the active site of target proteins. Since MD is a time-consuming process, existing approaches reduce the number of receptors or ligands in docking by evaluating only small sets of compounds. This restriction in the search space reduces the chances to uniformly cover the diverse space of compounds and misses opportunities to recognize whether new drugs can be identified. Another difficulty with large-scale is analyzing the results, e.g. browsing all directories manually to find which pairs were docked successfully. To address these issues we explored the potential of data provenance analysis and parallel processing of SciCumulus, a cloud Scientific Workflow Management System. We present SciDock, a molecular docking-based virtual screening workflow and evaluate its execution using 10,000 receptor-ligand pairs related to proteases enzymes of protozoan genomes. The overall performance of SciDock using 32 cores, in cloud virtual machines, reaches improvements up to 95.4% when running SciDock with AutoDock and 96.1% when running SciDock with Vina. We show how data provenance improved the result analysis and how it may indicate potential proteases drug targets for protozoan treatments.

[1]  David R. Riley,et al.  CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing , 2011, BMC Bioinformatics.

[2]  Daniel S. Katz,et al.  Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications , 2012, SWEET '12.

[3]  Ewa Deelman,et al.  Pegasus: Mapping Large-Scale Workflows to Distributed Resources , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[4]  Marta Mattoso,et al.  Performance evaluation of parallel strategies in public clouds: A study with phylogenomic workflows , 2013, Future Gener. Comput. Syst..

[5]  W L Jorgensen,et al.  Rusting of the lock and key model for protein-ligand binding. , 1991, Science.

[6]  Manish Parashar,et al.  Accelerating MapReduce Analytics Using CometCloud , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[7]  David S. Goodsell,et al.  The RCSB Protein Data Bank: new resources for research and education , 2012, Nucleic Acids Res..

[8]  Markus Sitzmann,et al.  Software and resources for computational medicinal chemistry. , 2011, Future medicinal chemistry.

[9]  Izabela Berdowska,et al.  Cysteine proteases as disease markers. , 2004, Clinica chimica acta; international journal of clinical chemistry.

[10]  Scott Myers,et al.  Drug discovery—an operating model for a new era , 2001, Nature Biotechnology.

[11]  Keith W. Miller,et al.  Big Data: New Opportunities and New Challenges [Guest editors' introduction] , 2013, Computer.

[12]  Duncan D. A. Ruiz,et al.  wFReDoW: A Cloud-Based Web Environment to Handle Molecular Docking Simulations of a Fully Flexible Receptor Model , 2013, BioMed research international.

[13]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[14]  Marta Mattoso,et al.  A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds , 2012, Journal of Grid Computing.

[15]  Marta Mattoso,et al.  User-steering of HPC workflows: state-of-the-art and future directions , 2013, SWEET '13.

[16]  Marta Mattoso,et al.  An algebraic approach for data-centric scientific workflows , 2011, Proc. VLDB Endow..

[17]  Jonathan W. Essex,et al.  A review of protein-small molecule docking methods , 2002, J. Comput. Aided Mol. Des..

[18]  Chris Rose,et al.  A Break in the Clouds: Towards a Cloud Definition , 2011 .

[19]  Ajay N. Jain,et al.  Surflex-Dock: Docking benchmarks and real-world application , 2012, Journal of Computer-Aided Molecular Design.

[20]  Alexander A. Morgan,et al.  Translational bioinformatics in the cloud: an affordable alternative , 2010, Genome Medicine.

[21]  Marta Mattoso,et al.  Towards a Taxonomy for Cloud Computing from an e-Science Perspective , 2010, Cloud Computing.

[22]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[23]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[24]  Moustafa Ghanem,et al.  Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support , 2012, BMC Bioinformatics.

[25]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[26]  G. H. Coombs,et al.  Clan CD cysteine peptidases of parasitic protozoa. , 2003, Trends in parasitology.

[27]  Michel F Sanner,et al.  FLIPDock: Docking flexible ligands into flexible receptors , 2007, Proteins.

[28]  G. Morris,et al.  Molecular docking. , 2008, Methods in molecular biology.

[29]  K. Ginalski Comparative modeling for protein structure prediction. , 2006, Current opinion in structural biology.

[30]  Shaomeng Wang,et al.  An Extensive Test of 14 Scoring Functions Using the PDBbind Refined Set of 800 Protein-Ligand Complexes , 2004, J. Chem. Inf. Model..

[31]  J. Bajorath,et al.  Docking and scoring in virtual screening for drug discovery: methods and applications , 2004, Nature Reviews Drug Discovery.

[32]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998, J. Comput. Chem..

[33]  E Ray Dorsey,et al.  Financial anatomy of biomedical research. , 2005, JAMA.

[34]  Stephen R Johnson,et al.  To measure is to know: an approach to CADD performance metrics. , 2011, Drug discovery today.

[35]  Moustafa Ghanem,et al.  DockFlow - a prototypic PharmaGrid for Virtual Screening Integrating Four Different Docking Tools , 2009, HealthGrid.

[36]  H. Kunz,et al.  Emil Fischer--unequalled classicist, master of organic chemistry research, and inspired trailblazer of biological chemistry. , 2002, Angewandte Chemie.

[37]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[38]  Marta Mattoso,et al.  SciCumulus: A Lightweight Cloud Middleware to Explore Many Task Computing Paradigm in Scientific Workflows , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[39]  T Lengauer,et al.  CASP2 experiences with docking flexible ligands using FLEXX , 1997, Proteins.

[40]  Tina Ritschel,et al.  Current progress in Structure-Based Rational Drug Design marks a new mindset in drug discovery , 2018 .

[41]  R. Powers Advances in nuclear magnetic resonance for drug discovery , 2009, Expert opinion on drug discovery.

[42]  Jerome BaudryCorresponding authorUT,et al.  High-throughput virtual molecular docking with AutoDockCloud , 2012 .

[43]  Murat R. Gainullin,et al.  VSDocker: a tool for parallel high-throughput virtual screening using AutoDock on Windows-based computer clusters , 2010, Bioinform..

[44]  James Cheney,et al.  The W3C PROV family of specifications for modelling provenance metadata , 2013, EDBT '13.

[45]  Max W. Chang,et al.  Virtual Screening for HIV Protease Inhibitors: A Comparison of AutoDock 4 and Vina , 2010, PloS one.

[46]  Ivona Brandic,et al.  Optimizing bioinformatics workflows for data analysis using cloud management techniques , 2011, WORKS '11.

[47]  Marta Mattoso,et al.  SciPhy: A Cloud-Based Workflow for Phylogenetic Analysis of Drug Targets in Protozoan Genomes , 2011, BSB.

[48]  Marta Mattoso,et al.  Designing a parallel cloud based comparative genomics workflow to improve phylogenetic analyses , 2013, Future Gener. Comput. Syst..

[49]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[50]  Marta Mattoso,et al.  Exploring Molecular Evolution Reconstruction Using a Parallel Cloud Based Scientific Workflow , 2012, BSB.

[51]  J. Lindoso,et al.  Neglected tropical diseases in Brazil. , 2009, Revista do Instituto de Medicina Tropical de Sao Paulo.

[52]  Joel T Dudley,et al.  In silico research in the era of cloud computing , 2010, Nature Biotechnology.

[53]  Richard D. Taylor,et al.  Improved protein–ligand docking using GOLD , 2003, Proteins.

[54]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences: current status, policy and new initiatives , 2008, Nucleic Acids Res..

[55]  Kary A. C. S. Ocaña,et al.  Phylogenomics-Based Reconstruction of Protozoan Species Tree , 2011, Evolutionary bioinformatics online.

[56]  Hongwei Huang,et al.  E-Novo: An Automated Workflow for Efficient Structure-Based Lead Optimization , 2009, J. Chem. Inf. Model..

[57]  Sally R. Ellingson,et al.  High‐throughput virtual molecular docking with AutoDockCloud , 2014, Concurr. Comput. Pract. Exp..