Exploring Cloud Computing for Large-Scale Scientific Applications

This paper explores cloud computing for large-scale data intensive scientific applications. Cloud computing is attractive because it provides hardware and software resources on-demand, which relieves the burden of acquiring and maintaining a huge amount of resources that may be used only once by a scientific application. However, unlike typical commercial applications that often just requires a moderate amount of ordinary resources, large-scale scientific applications often need to process enormous amount of data in the terabyte or even petabyte range and require special high performance hardware with low latency connections to complete computation in a reasonable amount of time. To address these challenges, we build an infrastructure that can dynamically select high performance computing hardware across institutions and dynamically adapt the computation to the selected resources to achieve high performance. We have also demonstrated the effectiveness of our infrastructure by building a system biology application and an uncertainty quantification application for carbon sequestration, which can efficiently utilize data and computation resources across several institutions.

[1]  Adam Wynne,et al.  Components in the Pipeline , 2011, IEEE Software.

[2]  Jano I. van Hemert,et al.  Scientific Workflow: A Survey and Research Directions , 2007, PPAM.

[3]  George Em Karniadakis,et al.  Sensitivity analysis and stochastic simulations of non‐equilibrium plasma flow , 2009 .

[4]  Adam Wynne,et al.  Services + Components = Data Intensive Scientific Workflow Applications with MeDICi , 2009, CBSE.

[5]  William R. Cannon,et al.  Physicochemical/Thermodynamic Framework for the Interpretation of Peptide Tandem Mass Spectra† , 2010 .

[6]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[7]  Ian J. Taylor,et al.  The Triana Workflow Environment: Architecture and Applications , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[8]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[9]  Craig A. Lee A perspective on scientific cloud computing , 2010, HPDC '10.

[10]  John R. Nuckols,et al.  Stochastic modeling of exposure and risk in a contaminated heterogeneous aquifer. 2 : Application of Latin Hypercube Sampling , 1999 .

[11]  Gabriel A. Madeira,et al.  Accelerated solution of a moral hazard problem with Swift , 2007 .

[12]  George Em Karniadakis,et al.  Stochastic Simulations and Sensitivity Analysis of Plasma Flow , 2008 .

[13]  Jordan K. Eccles,et al.  Physical and economic potential of geological CO2 storage in saline aquifers. , 2009, Environmental science & technology.