Performance and cost analysis of the Supernova factory on the Amazon AWS cloud

Today, our picture of the Universe radically differs from that of just over a decade ago. We now know that the Universe is not only expanding as Hubble discovered in 1929, but that the rate of expansion is accelerating, propelled by mysterious new physics dubbed “Dark Energy”. This revolutionary discovery was made by comparing the brightness of nearby Type Ia supernovae (which exploded in the past billion years) to that of much more distant ones (from up to seven billion years ago). The reliability of this comparison hinges upon a very detailed understanding of the physics of the nearby events. To further this understanding, the Nearby Supernova Factory (SNfactory) relies upon a complex pipeline of serial processes that execute various image processing algorithms in parallel on ∼10 TBs of data. This pipeline traditionally runs on a local cluster. Cloud computing [Above the clouds: a Berkeley view of cloud computing, Technical Report UCB/EECS-2009-28, University of California, 2009] offers many features that make it an attractive alternative. The ability to completely control the software environment in a cloud is appealing when dealing with a community developed science pipeline with many unique library and platform requirements. In this context we study the feasibility of porting the SNfactory pipeline to the Amazon Web Services environment. Specifically we: describe the tool set we developed to manage a virtual cluster on Amazon EC2, explore the various design options available for application data placement, and offer detailed performance results and lessons learned from each of the above design options.

[1]  Matei Ripeanu,et al.  Amazon S3 for science grids: a viable solution? , 2008, DADC '08.

[2]  Ewa Deelman,et al.  The cost of doing science on the cloud: the Montage example , 2008, HiPC 2008.

[3]  M. Phillips,et al.  Observational Evidence from Supernovae for an Accelerating Universe and a Cosmological Constant , 1998, astro-ph/9805201.

[4]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[5]  A. G.,et al.  MEASUREMENTS OF AND FROM 42 HIGH-REDSHIFT SUPERNOVAE , 1998 .

[6]  John Shalf,et al.  Using IOR to analyze the I/O Performance for HPC Platforms , 2007 .

[7]  Jerome Lauret,et al.  Virtual workspaces for scientific applications. , 2007 .

[8]  Renato Figueiredo,et al.  Science Clouds: Early Experiences in Cloud Computing for Scientific Applications , 2008 .

[9]  JAMES DEMMEL,et al.  LAPACK: A portable linear algebra library for high-performance computers , 1990, Proceedings SUPERCOMPUTING '90.

[10]  D. Wells,et al.  Fits: a flexible image transport system , 1981 .

[11]  R. Bacon,et al.  Overview of the Nearby Supernova Factory , 2002, SPIE Astronomical Telescopes + Instrumentation.

[12]  Constantinos Evangelinos,et al.  Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere- , 2008 .

[13]  Cecilia R. Aragon,et al.  Using Visual Analytics to Develop Situation Awareness in Astrophysics , 2009, Inf. Vis..

[14]  Carl Staelin,et al.  lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.