Performance Studies on Distributed Virtual Screening

Virtual high-throughput screening (vHTS) is an invaluable method in modern drug discovery. It permits screening large datasets or databases of chemical structures for those structures binding possibly to a drug target. Virtual screening is typically performed by docking code, which often runs sequentially. Processing of huge vHTS datasets can be parallelized by chunking the data because individual docking runs are independent of each other. The goal of this work is to find an optimal splitting maximizing the speedup while considering overhead and available cores on Distributed Computing Infrastructures (DCIs). We have conducted thorough performance studies accounting not only for the runtime of the docking itself, but also for structure preparation. Performance studies were conducted via the workflow-enabled science gateway MoSGrid (Molecular Simulation Grid). As input we used benchmark datasets for protein kinases. Our performance studies show that docking workflows can be made to scale almost linearly up to 500 concurrent processes distributed even over large DCIs, thus accelerating vHTS campaigns significantly.

[1]  Björn Hagemeier,et al.  UNICORE 6 — Recent and Future Advancements , 2010, Ann. des Télécommunications.

[2]  Dmitri I. Svergun,et al.  WeNMR: Structural Biology on the Grid , 2011, Journal of Grid Computing.

[3]  Patrick Fuhrmann,et al.  dCache, Storage System for the Future , 2006, Euro-Par.

[4]  E. D’Angelo The human brain project. , 2012, Functional neurology.

[5]  Tjerk P. Straatsma,et al.  NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..

[6]  Geoffrey C. Fox,et al.  The Open Grid Computing Environments collaboration: portlets and services for science gateways , 2007, Concurr. Comput. Pract. Exp..

[7]  Péter Kacsuk,et al.  P‐GRADE portal family for grid infrastructures , 2011, Concurr. Comput. Pract. Exp..

[8]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[9]  Richard Grunzke,et al.  User-Friendly Workflows in Quantum Chemistry , 2013, IWSG.

[10]  Thomas Steinke,et al.  The MoSGrid Science Gateway - A Complete Solution for Molecular Simulations. , 2014, Journal of chemical theory and computation.

[11]  Reagan Moore,et al.  iRODS Primer: Integrated Rule-Oriented Data System , 2010, iRODS Primer.

[12]  Oliver Kohlbacher CADDSuite – a workflow-enabled suite of open-source tools for drug discovery , 2012, Journal of Cheminformatics.

[13]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[14]  Liwei Li,et al.  BioDrugScreen: a computational drug design resource for ranking molecules docked to the human proteome , 2009, Nucleic Acids Res..

[15]  Frank M. Boeckler,et al.  Evaluation and Optimization of Virtual Screening Workflows with DEKOIS 2.0 - A Public Library of Challenging Docking Benchmark Sets , 2013, J. Chem. Inf. Model..

[16]  Jarek Nabrzyski,et al.  The Vine Toolkit: A Java Framework for Developing Grid Applications , 2007, PPAM.

[17]  Jeff Hodges,et al.  Assertions and Protocol for the OASIS Security Assertion Markup Language (SAML) V2. 0 , 2001 .

[18]  Thomas Steinke,et al.  Standards‐based metadata management for molecular simulations , 2014, Concurr. Comput. Pract. Exp..

[19]  Michael McLennan,et al.  HUBzero: A Platform for Dissemination and Collaboration in Computational Science and Engineering , 2010, Computing in Science & Engineering.

[20]  Gerhard Klebe,et al.  DSX: A Knowledge-Based Scoring Function for the Assessment of Protein-Ligand Complexes , 2011, J. Chem. Inf. Model..

[21]  Thomas Lengauer,et al.  A fast flexible docking method using an incremental construction algorithm. , 1996, Journal of molecular biology.

[22]  Cheng Luo,et al.  Computational drug discovery , 2012, Acta Pharmacologica Sinica.

[23]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[24]  Ken Klingenstein,et al.  Federated Security: The Shibboleth Approach , 2004 .

[25]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[26]  Richard Grunzke,et al.  Towards Generic Metadata Management in Distributed Science Gateway Infrastructures , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[27]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998, J. Comput. Chem..

[28]  Thomas Steinke,et al.  A Single Sign-On Infrastructure for Science Gateways on a Use Case for Structural Bioinformatics , 2012, Journal of Grid Computing.

[29]  R. Rotondo,et al.  Conjugating science gateways and grid portals into e-collaboration environments: the Liferay and GENIUS/EnginFrame use case , 2010 .

[30]  Eugenio Cesario,et al.  The XtreemFS architecture—a case for object‐based file systems in Grids , 2008, Concurr. Comput. Pract. Exp..

[31]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[32]  Thomas Steinke,et al.  A Data Driven Science Gateway for Computational Workflows , 2012 .

[33]  Bernd Schuller,et al.  The UNICORE Rich Client: Facilitating the Automated Execution of Scientific Workflows , 2010, 2010 IEEE Sixth International Conference on e-Science.