Advances in distributed computing with modern drug discovery

ABSTRACT Introduction: Computational chemistry dramatically accelerates the drug discovery process and high-performance computing (HPC) can be used to speed up the most expensive calculations. Supporting a local HPC infrastructure is both costly and time-consuming, and, therefore, many research groups are moving from in-house solutions to remote-distributed computing platforms. Areas covered: The authors focus on the use of distributed technologies, solutions, and infrastructures to gain access to HPC capabilities, software tools, and datasets to run the complex simulations required in computational drug discovery (CDD). Expert opinion: The use of computational tools can decrease the time to market of new drugs. HPC has a crucial role in handling the complex algorithms and large volumes of data required to achieve specificity and avoid undesirable side-effects. Distributed computing environments have clear advantages over in-house solutions in terms of cost and sustainability. The use of infrastructures relying on virtualization reduces set-up costs. Distributed computing resources can be difficult to access, although web-based solutions are becoming increasingly available. There is a trade-off between cost-effectiveness and accessibility in using on-demand computing resources rather than free/academic resources. Graphics processing unit computing, with its outstanding parallel computing power, is becoming increasingly important.

[1]  D. Horvath,et al.  ISIDA Property‐Labelled Fragment Descriptors , 2010, Molecular informatics.

[2]  Carole A. Goble,et al.  Taverna/myGrid: Aligning a Workflow System with the Life Sciences Community , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[3]  Frédéric Desprez,et al.  Large scale execution of a bioinformatic application on a volunteer grid , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[4]  T. O. H. White,et al.  Molecular dynamics in a grid computing environment: experiences using DL_POLY_3 within the eMinerals escience project , 2006 .

[5]  Michael M. Mysinger,et al.  Automated Docking Screens: A Feasibility Study , 2009, Journal of medicinal chemistry.

[6]  Aliuska Morales Helguera,et al.  QSAR models to predict mutagenicity of acrylates, methacrylates and alpha,beta-unsaturated carbonyl compounds. , 2010, Dental materials : official publication of the Academy of Dental Materials.

[7]  Kwong-Sak Leung,et al.  USR-VS: a web server for large-scale prospective virtual screening using ultrafast shape recognition techniques , 2016, Nucleic Acids Res..

[8]  Robert Stevens,et al.  {myGrid} and the drug discovery process , 2004 .

[9]  M J Harvey,et al.  ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale. , 2009, Journal of chemical theory and computation.

[10]  Gianni De Fabritiis,et al.  AceCloud: Molecular Dynamics Simulations in the Cloud , 2015, J. Chem. Inf. Model..

[11]  Leonardo L. G. Ferreira,et al.  Molecular Docking and Structure-Based Drug Design Strategies , 2015, Molecules.

[12]  David Baker,et al.  A Computationally Designed Hemagglutinin Stem-Binding Protein Provides In Vivo Protection from Influenza Independent of a Host Immune Response , 2016, PLoS pathogens.

[13]  Michael Feig,et al.  PREFMD: a web server for protein structure refinement via molecular dynamics simulations , 2018, Bioinform..

[14]  Andrea Clematis,et al.  Cloud Infrastructures for In Silico Drug Discovery: Economic and Practical Aspects , 2013, BioMed research international.

[15]  Drazen Petrov,et al.  Vienna-PTM web server: a toolkit for MD simulations of protein post-translational modifications , 2013, Nucleic Acids Res..

[16]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .

[17]  D. Baker,et al.  Engineering of Kuma030: A Gliadin Peptidase That Rapidly Degrades Immunogenic Gliadin Peptides in Gastric Conditions. , 2015, Journal of the American Chemical Society.

[18]  Alexandre Varnek,et al.  Structural and Physico-Chemical Interpretation (SPCI) of QSAR Models and Its Comparison with Matched Molecular Pair Analysis , 2016, J. Chem. Inf. Model..

[19]  Domingo Giménez,et al.  METADOCK: A parallel metaheuristic schema for virtual screening methods , 2018, Int. J. High Perform. Comput. Appl..

[20]  J. Ramanujam,et al.  GeauxDock: Accelerating Structure-Based Virtual Screening with Heterogeneous Computing , 2016, PloS one.

[21]  Martin C. Herbordt,et al.  Fast binding site mapping using GPUs and CUDA , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[22]  Simon McIntosh-Smith,et al.  High performance in silico virtual drug screening on many-core processors , 2015, Int. J. High Perform. Comput. Appl..

[23]  Joel R. Bock,et al.  A New Method to Estimate Ligand-Receptor Energetics* , 2002, Molecular & Cellular Proteomics.

[24]  José M. García,et al.  High-Throughput parallel blind Virtual Screening using BINDSURF , 2012, BMC Bioinformatics.

[25]  Iuliana Marin,et al.  Benchmarking MD systems simulations on the graphics processing unit and multi-core systems , 2016, 2016 IEEE International Symposium on Systems Engineering (ISSE).

[26]  Xia Wang,et al.  iDrug: a web-accessible and interactive drug discovery and design platform , 2014, Journal of Cheminformatics.

[27]  Kai-Wei Chang,et al.  iScreen: world’s first cloud-computing web server for virtual screening and de novo drug design based on TCM database@Taiwan , 2011, J. Comput. Aided Mol. Des..

[28]  Martin Hofmann-Apitius,et al.  Grid-Added Value to Address Malaria , 2006, IEEE Transactions on Information Technology in Biomedicine.

[29]  Akila Gothandaraman,et al.  Comparing Hardware Accelerators in Scientific Applications: A Case Study , 2011, IEEE Transactions on Parallel and Distributed Systems.

[30]  Juan M. Luco,et al.  QSAR Based on Multiple Linear Regression and PLS Methods for the Anti-HIV Activity of a Large Group of HEPT Derivatives , 1997, J. Chem. Inf. Comput. Sci..

[31]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[32]  Peter Bankhead,et al.  cudaMap: a GPU accelerated program for gene expression connectivity mapping , 2013, BMC Bioinformatics.

[33]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[34]  David Baker,et al.  Comprehensive computational design of ordered peptide macrocycles , 2017, Science.

[35]  Minutes,et al.  MOLECULAR IMAGING IN DRUG DISCOVERY AND DEVELOPMENT , 2003 .

[36]  Elisabetta Ronchieri,et al.  WNoDeS, a tool for integrated Grid and Cloud access and computing farm virtualization , 2011 .

[37]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[38]  Jing Zhao,et al.  Hadoop MapReduce Framework to Implement Molecular Docking of Large-Scale Virtual Screening , 2012, 2012 IEEE Asia-Pacific Services Computing Conference.

[39]  Franck Cappello,et al.  Cost-benefit analysis of Cloud Computing versus desktop grids , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[40]  Tom Fifield,et al.  Integration of cloud, grid and local cluster resources with DIRAC , 2011 .

[41]  Raphaël Couturier,et al.  Designing Scientific Applications on GPUs , 2013 .

[42]  Martin Hofmann-Apitius,et al.  Virtual screening on large scale grids , 2007, Parallel Comput..

[43]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[44]  Jesús A. Soto,et al.  Fuzzy clustering as rational partition method for QSAR , 2017 .

[45]  Jörg K. Wegner,et al.  Scaling Machine Learning for Target Prediction in Drug Discovery using Apache Spark , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[46]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[47]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[48]  W G Richards,et al.  Computer-aided molecular design. , 1983, Endeavour.

[49]  Emilio Benfenati,et al.  Grid Computing for the Estimation of Toxicity: Acute Toxicity on Fathead Minnow (Pimephales promelas) , 2007, GCCB.

[50]  Emilio Benfenati,et al.  QSAR Model for Predicting Pesticide Aquatic Toxicity , 2005, J. Chem. Inf. Model..

[51]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[52]  Thomas Steinke,et al.  The MoSGrid Science Gateway - A Complete Solution for Molecular Simulations. , 2014, Journal of chemical theory and computation.

[53]  Ivan Merelli,et al.  A novel molecular dynamics approach to evaluate the effect of phosphorylation on multimeric protein interface: the αB-Crystallin case study , 2016, BMC Bioinformatics.

[54]  Simon McIntosh-Smith,et al.  Porting a commercial application to OpenCL: a case study , 2014, IWOCL '14.

[55]  Ivan Merelli,et al.  Structural thermal adaptation of β‐tubulins from the Antarctic psychrophilic protozoan Euplotes focardii , 2012, Proteins.

[56]  Bernd Schuller,et al.  Chemomentum - UNICORE 6 Based Infrastructure for Complex Applications in Science and Technology , 2007, Euro-Par Workshops.

[57]  Lars Malmström,et al.  Structure prediction for CASP7 targets using extensive all‐atom refinement with Rosetta@home , 2007, Proteins.

[58]  David P. Anderson,et al.  High-Throughput All-Atom Molecular Dynamics Simulations Using Distributed Computing , 2010, J. Chem. Inf. Model..

[59]  Martin Hofmann-Apitius,et al.  WISDOM-II: Screening against multiple targets implicated in malaria using computational grid infrastructures , 2009, Malaria Journal.

[60]  Ke Chen,et al.  Survey of MapReduce frame operation in bioinformatics , 2013, Briefings Bioinform..

[61]  James Murty,et al.  Programming Amazon web services - S3, EC2, SQS, FPS, and SimpleDB: outsource your infrastructure , 2008 .

[62]  Matthew P. Repasky,et al.  Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. , 2006, Journal of medicinal chemistry.

[63]  Gábor Terstyánszky,et al.  EDGeS: A Bridge between Desktop Grids and Service Grids , 2008, The Third ChinaGrid Annual Conference (chinagrid 2008).

[64]  Rachid Darnag,et al.  Support vector machines: development of QSAR models for predicting anti-HIV-1 activity of TIBO derivatives. , 2010, European journal of medicinal chemistry.

[65]  Xiaofeng Liu,et al.  ChemMapper: a versatile web server for exploring pharmacology and chemical structure association based on molecular 3D similarity method , 2013, Bioinform..

[66]  Yunming Ye,et al.  A Graphical Workflow Modeler for Docking Process in Drug Discovery , 2009 .

[67]  J. Bajorath,et al.  Docking and scoring in virtual screening for drug discovery: methods and applications , 2004, Nature Reviews Drug Discovery.

[68]  Murat R. Gainullin,et al.  VSDocker: a tool for parallel high-throughput virtual screening using AutoDock on Windows-based computer clusters , 2010, Bioinform..

[69]  Bernd Schuller,et al.  OpenMolGRID: Using Automated Workflows in GRID Computing Environment , 2005, EGC.

[70]  V. Pande,et al.  Using massively parallel simulation and Markovian models to study protein folding: examining the dynamics of the villin headpiece. , 2006, The Journal of chemical physics.

[71]  Zhengwei Zhu,et al.  FR-HIT, a very fast program to recruit metagenomic reads to homologous reference genomes , 2011, Bioinform..

[72]  Sergey Lyskov,et al.  The RosettaDock server for local protein–protein docking , 2008, Nucleic Acids Res..

[73]  G. Marshall Computer-aided drug design. , 1987, Annual review of pharmacology and toxicology.

[74]  Ivan Merelli,et al.  BGBlast: A BLAST Grid Implementation with Database Self-Updating and Adaptive Replication , 2007, HealthGrid.

[75]  Andrea Clematis,et al.  High performance workflow implementation for protein surface characterization using grid technology , 2005, BMC Bioinformatics.

[76]  Terri K. Attwood,et al.  The EMBRACE web service collection , 2010, Nucleic Acids Res..

[77]  Ian Foster,et al.  Grid technologies empowering drug discovery. , 2002, Drug discovery today.

[78]  Uko Maran,et al.  Mining of the chemical information in GRID environment , 2007, Future Gener. Comput. Syst..

[79]  Horacio Pérez-Sánchez,et al.  Antibodies as Carrier Molecules: Encapsulating Anti-Inflammatory Drugs inside Herceptine. , 2018, The journal of physical chemistry. B.

[80]  Ivan Merelli,et al.  Managing, Analysing, and Integrating Big Data in Medical Bioinformatics: Open Problems and Future Perspectives , 2014, BioMed research international.

[81]  ChangKyoo Yoo,et al.  The applications of PCA in QSAR studies: A case study on CCR5 antagonists , 2018, Chemical biology & drug design.

[82]  Jun Zhang,et al.  DDGrid: Harness the Full Power of Supercomputing Systems , 2006, 2006 Fifth International Conference on Grid and Cooperative Computing Workshops.

[83]  David P. Anderson,et al.  SETI@home: an experiment in public-resource computing , 2002, CACM.

[84]  Lin He,et al.  SHEsisEpi, a GPU-enhanced genome-wide SNP-SNP interaction scanning algorithm, efficiently reveals the risk genetic epistasis in bipolar disorder , 2010, Cell Research.

[85]  Tomislav Lipic,et al.  Delivering bioinformatics MapReduce applications in the cloud , 2014, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[86]  Ashraf El-Sisi,et al.  Parallel ward clustering for chemical compounds using OpenCL , 2015, 2015 Tenth International Conference on Computer Engineering & Systems (ICCES).

[87]  I. Merelli,et al.  Evaluation of a Grid Based Molecular Dynamics Approach for Polypeptide Simulations , 2007, IEEE Transactions on NanoBioscience.

[88]  Ian Foster,et al.  The Globus toolkit , 1998 .

[89]  Michal Brylinski,et al.  Structure-Based Drug Discovery Accelerated by Many-Core Devices. , 2016, Current drug targets.

[90]  Ivan Merelli,et al.  Porting bioinformatics applications from grid to cloud: A macromolecular surface analysis application case study , 2017, Int. J. High Perform. Comput. Appl..

[91]  Ferenc Darvas,et al.  OpenMolGRID, a GRID Based System for Solving Large-Scale Drug Design Problems , 2004, European Across Grids Conference.

[92]  Fedor N. Novikov,et al.  Lead finder: an approach to improve accuracy of protein-ligand docking, binding energy estimation, and virtual screening. , 2008, Journal of chemical information and modeling.

[93]  S. Wodak,et al.  Docking and scoring protein complexes: CAPRI 3rd Edition , 2007, Proteins.

[94]  Ivan Merelli,et al.  In silico saturation mutagenesis and docking screening for the analysis of protein-ligand interaction: the Endothelial Protein C Receptor case study , 2009, BMC Bioinformatics.

[95]  Jennifer L. Knight,et al.  Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. , 2015, Journal of the American Chemical Society.

[96]  Wei Zhou,et al.  MetaSpark: a spark‐based distributed processing tool to recruit metagenomic reads to reference genomes , 2017, Bioinform..

[97]  G C P van Zundert,et al.  The HADDOCK2.2 Web Server: User-Friendly Integrative Modeling of Biomolecular Complexes. , 2016, Journal of molecular biology.

[98]  Alireza Mehridehnavi,et al.  Neural network and deep-learning algorithms used in QSAR studies: merits and drawbacks. , 2018, Drug discovery today.

[99]  Ying-Tsang Lo,et al.  Protein-ligand binding region prediction (PLB-SAVE) based on geometric features and CUDA acceleration , 2013, BMC Bioinformatics.

[100]  David Baker,et al.  Computational Design of an α-Gliadin Peptidase , 2012, Journal of the American Chemical Society.

[101]  Uko Maran,et al.  Open Computing Grid for Molecular Science and Engineering , 2006, J. Chem. Inf. Model..

[102]  Mohamed Batouche,et al.  Drug discovery for breast cancer based on big data analytics techniques , 2015, 2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA).

[103]  Peter M. Kasson,et al.  Computational Biology in the Cloud: Methods and New Insights from Computing at Scale , 2012, Pacific Symposium on Biocomputing.

[104]  Markus List,et al.  Using Docker Compose for the Simple Deployment of an Integrated Drug Target Screening Platform , 2017, J. Integr. Bioinform..

[105]  Jessica Holien,et al.  Improvements, trends, and new ideas in molecular docking: 2012–2013 in review , 2015, Journal of molecular recognition : JMR.

[106]  José A. B. Fortes,et al.  CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications , 2008, 2008 IEEE Fourth International Conference on eScience.

[107]  R. Combes,et al.  Topological structural alerts modulations of mammalian cell mutagenicity for halogenated derivatives , 2014, SAR and QSAR in environmental research.

[108]  Karl Gruber Google for genomes , 2014, Nature Biotechnology.

[109]  Luciano Milanesi,et al.  Bioinfogrid:. Bioinformatics Simulation and Modeling Based on Grid , 2007 .

[110]  日経BP社,et al.  Amazon Web Services完全ソリューションガイド , 2016 .

[111]  David Baker,et al.  Accurate de novo design of hyperstable constrained peptides , 2016, Nature.

[112]  Xi Dai,et al.  HybridSim‐VS: a web server for large‐scale ligand‐based virtual screening using hybrid similarity recognition techniques , 2017, Bioinform..

[113]  Aurélien Grosdidier,et al.  SwissDock, a protein-small molecule docking web service based on EADock DSS , 2011, Nucleic Acids Res..

[114]  Chao Ma,et al.  GPU Accelerated Chemical Similarity Calculation for Compound Library Comparison , 2011, J. Chem. Inf. Model..

[115]  Andrea Clematis,et al.  The WNoDeS Cloud Virtualization Framework: A Macromolecular Surface Analysis Application Case Study , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[116]  Charles Loomis,et al.  Scheduling for Responsive Grids , 2008, Journal of Grid Computing.

[117]  Olivier Michielin,et al.  SwissSimilarity: A Web Tool for Low to Ultra High Throughput Ligand-Based Virtual Screening , 2016, J. Chem. Inf. Model..

[118]  Didier Devaurs,et al.  MoMA-LigPath: a web server to simulate protein–ligand unbinding , 2013, Nucleic Acids Res..

[119]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[120]  Ian T. Foster,et al.  The History of the Grid , 2022, High Performance Computing Workshop.

[121]  Igor V Tetko,et al.  Public (Q)SAR Services, Integrated Modeling Environments, and Model Repositories on the Web: State of the Art and Perspectives for Future Development , 2017, Molecular informatics.

[122]  S. Ekins,et al.  In silico pharmacology for drug discovery: methods for virtual ligand screening and profiling , 2007, British journal of pharmacology.

[123]  Che-Lun Hung,et al.  Computational Approaches for Drug Discovery , 2014, Drug development research.

[124]  David Ryan Koes,et al.  ZINCPharmer: pharmacophore search of the ZINC database , 2012, Nucleic Acids Res..

[125]  Santosh A. Khedkar,et al.  Successful applications of computer aided drug discovery: moving drugs from concept to the clinic. , 2010, Current topics in medicinal chemistry.

[126]  David M. Toth,et al.  Accelerating AutoDock Vina with Containerization , 2018, PEARC.

[127]  W. Johnson,et al.  University of Oxford , 1956, Nature.

[128]  Ivan Merelli,et al.  Virtual screening pipeline and ligand modelling for H5N1 neuraminidase. , 2009, Biochemical and biophysical research communications.

[129]  Robert P Sheridan,et al.  Why do we need so many chemical similarity search methods? , 2002, Drug discovery today.

[130]  Ola Spjuth,et al.  Using Iterative MapReduce for Parallel Virtual Screening , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[131]  Fedor N. Novikov,et al.  Lead Finder: An Approach To Improve Accuracy of Protein-Ligand Docking, Binding Energy Estimation, and Virtual Screening , 2009, J. Chem. Inf. Model..

[132]  Serdar Kuyucak,et al.  Accurate determination of the binding free energy for KcsA-charybdotoxin complex from the potential of mean force calculations with restraints. , 2011, Biophysical journal.

[133]  Lei Huang,et al.  Generalized scalable multiple copy algorithms for molecular dynamics simulations in NAMD , 2014, Comput. Phys. Commun..

[134]  Moreno Marzolla,et al.  The gLite Workload Management System , 2008, GPC.

[135]  Tomasz Puzyn,et al.  “NanoBRIDGES” software: Open access tools to perform QSAR and nano-QSAR modeling , 2015 .

[136]  Timothy A. Whitehead,et al.  Computational Design of Proteins Targeting the Conserved Stem Region of Influenza Hemagglutinin , 2011, Science.

[137]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[138]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[139]  J. Bajorath,et al.  Chemoinformatics: a view of the field and current trends in method development. , 2012, Bioorganic & medicinal chemistry.

[140]  Ivan Merelli,et al.  Static and dynamic interactions between GALK enzyme and known inhibitors: guidelines to design new drugs for galactosemic patients. , 2013, European journal of medicinal chemistry.

[141]  Brian K. Shoichet,et al.  Virtual screening of chemical libraries , 2004, Nature.

[142]  Marco Roos,et al.  The promise of a virtual lab in drug discovery. , 2006, Drug discovery today.

[143]  E. Lionta,et al.  Structure-Based Virtual Screening for Drug Discovery: Principles, Applications and Recent Advances , 2014, Current topics in medicinal chemistry.

[144]  Andrea Clematis,et al.  Image-Based Surface Matching Algorithm Oriented to Structural Biology , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[145]  Dong-Sheng Cao,et al.  BioTriangle: a web-accessible platform for generating various molecular representations for chemicals, proteins, DNAs/RNAs and their interactions , 2016, Journal of Cheminformatics.

[146]  Carole A. Goble,et al.  myGrid: personalised bioinformatics on the information grid , 2003, ISMB.

[147]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[148]  Vijay S. Pande,et al.  Folding@home: Lessons from eight years of volunteer distributed computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[149]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[150]  David Abramson,et al.  The Virtual Laboratory: a toolset to enable distributed molecular modelling for drug design on the World‐Wide Grid , 2003, Concurr. Comput. Pract. Exp..