Cloud computing for fast prediction of chemical activity

Quantitative Structure-Activity Relationships (QSAR) is a method for creating models that can predict certain properties of compounds. It is of growing importance in the design of new drugs. The quantity of data now available for building models is increasing rapidly, which has the advantage that more accurate models can be created, for a wider range of properties. However the disadvantage is that the amount of computation required for model building has also dramatically increased. Therefore, it became vital to find a way to accelerate this process. We have achieved this by exploiting parallelism in searching the QSAR model space for the best models. This paper shows how the cloud computing paradigm can be a good fit to this approach. It describes the design and implementation of a tool for exploring the model space that exploits our e-Science Central cloud platform. We report on the scalability achieved and the experiences gained when designing the solution. The acceleration and absolute performance achieved is much greater than for existing QSAR solutions, creating the potential for new, interesting research, and the exploitation of this approach to accelerate other types of applications.

[1]  Jie Li,et al.  Early observations on the performance of Windows Azure , 2011, CloudCom 2011.

[2]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[3]  Didier Sornette,et al.  Encyclopedia of Complexity and Systems Science , 2009 .

[4]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[5]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[6]  D. Hoekman Exploring QSAR Fundamentals and Applications in Chemistry and Biology, Volume 1. Hydrophobic, Electronic and Steric Constants, Volume 2 J. Am. Chem. Soc. 1995, 117, 9782 , 1996 .

[7]  Paul Watson,et al.  e‐Science Central for CARMEN: science as a service , 2010, Concurr. Comput. Pract. Exp..

[8]  David E. Leahy,et al.  Automated QSPR through Competitive Workflow , 2005, J. Comput. Aided Mol. Des..

[9]  Paul Watson,et al.  Accelerating Chemical Property Prediction with Cloud Computing , 2010 .

[10]  Ulf Norinder,et al.  Automated QSAR with a Hierarchy of Global and Local Models , 2011, Molecular informatics.

[11]  A. Hopfinger,et al.  Methods for applying the quantitative structure-activity relationship paradigm. , 2004, Methods in molecular biology.

[12]  Ewa Deelman,et al.  Scaling up workflow-based applications , 2010, J. Comput. Syst. Sci..

[13]  Alexander S. Szalay,et al.  Middleware support for many-task computing , 2010, Cluster Computing.

[14]  Paul Watson,et al.  e-Science Central for CARMEN: science as a service , 2010 .

[15]  Paul Watson,et al.  e-Science Central: Cloud-based e-Science and its application to chemical property modelling , 2010 .