A Large-Scale Elastic Environment for Scientific Computing

The relatively recent introduction of infrastructure-as-a-service (IaaS) clouds, such as Amazon Elastic Compute Cloud (EC2), provide users with the ability to deploy custom software stacks in virtual machines (VMs) across different cloud providers. Users can leverage IaaS clouds to create elastic environments that outsource compute and storage as needed. Additionally, these environments can adapt dynamically to demand, scaling up as demand increases and scaling down as demand decreases. In this paper, we present a large-scale elastic environment that extends cluster resources managers (e.g. Torque) with IaaS resources. Our solution integrates with an open-source elastic manager, the Elastic Processing Unit (EPU), and includes the ability to periodically recontextualize the environment with a light-weight REST-based recontextualization broker. We deploy the Gluster file system to provide a shared file system for all nodes in the environment. Though our implementation currently only supports Torque, we also thoroughly discuss how our architecture can interface with different workflows, including Hadoop’s MapReduce workflows and Condor’s match-making and high-throughput capabilities. For evaluation, we demonstrate the ability to recontextualize 256-node environments within one second of the recontextualization period, scale to over 475 nodes in less than 15 minutes, and support parallel IO from distributed nodes.

[1]  Paul Marshall,et al.  Provisioning Policies for Elastic Computing Environments , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[2]  Sebastien Goasguen,et al.  Dynamic Provisioning of Virtual Organization Clusters , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[3]  Ian T. Foster,et al.  Virtual workspaces: Achieving quality of service and quality of life in the Grid , 2005, Sci. Program..

[4]  Mark J. Clement,et al.  Core Algorithms of the Maui Scheduler , 2001, JSSPP.

[5]  Dongyan Xu,et al.  VioCluster: Virtualization for Dynamic Computational Domains , 2005, 2005 IEEE International Conference on Cluster Computing.

[6]  Wolfgang Gentzsch,et al.  Sun Grid Engine: towards creating a compute power grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[7]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[8]  Andreas Wilke,et al.  Using clouds for metagenomics: A case study , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[9]  Devarshi Ghoshal,et al.  I/O performance of virtualized cloud environments , 2011, DataCloud-SC '11.

[10]  Paul Marshall,et al.  Elastic Site: Using Clouds to Elastically Extend Site Resources , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[11]  Dhabaleswar K. Panda,et al.  A case for high performance computing with virtual machines , 2006, ICS '06.

[12]  Ewa Deelman,et al.  Automating Application Deployment in Infrastructure Clouds , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[13]  Henry M. Tufo,et al.  Developing a Cloud Computing Charging Model for High-Performance Computing Resources , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[14]  Lavanya Ramakrishnan,et al.  Seeking supernovae in the clouds: a performance study , 2010, HPDC '10.

[15]  M. Prange,et al.  Scientific Computing in the Cloud , 2008, Computing in Science & Engineering.

[16]  Paul Marshall,et al.  Architecting a Large-scale Elastic Environment - Recontextualization and Adaptive Cloud Services for Scientific Computing , 2012, ICSOFT.

[17]  Dongyan Xu,et al.  Autonomic Live Adaptation of Virtual Computational Environments in a Multi-Domain Infrastructure , 2006, 2006 IEEE International Conference on Autonomic Computing.

[18]  John Bresnahan,et al.  Infrastructure outsourcing in multi-cloud environment , 2012, FederatedClouds '12.

[19]  Shujia Zhou,et al.  Case study for running HPC applications in public clouds , 2010, HPDC '10.

[20]  Katarzyna Keahey,et al.  Contextualization: Providing One-Click Virtual Clusters , 2008, 2008 IEEE Fourth International Conference on eScience.

[21]  Zhou Lei,et al.  The portable batch scheduler and the maui scheduler on linux clusters , 2000 .

[22]  G. Bruce Berriman,et al.  Data Sharing Options for Scientific Workflows on Amazon EC2 , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[23]  Miron Livny,et al.  Condor: a distributed job scheduler , 2001 .

[24]  Roger Impey,et al.  Cloud Scheduler: a resource manager for distributed compute clouds , 2010, ArXiv.

[25]  Alexandru Iosup,et al.  A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing , 2009, CloudComp.

[26]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[27]  Steve Vinoski,et al.  Advanced Message Queuing Protocol , 2006, IEEE Internet Computing.

[28]  Borja Sotomayor,et al.  Virtual Infrastructure Management in Private and Hybrid Clouds , 2009, IEEE Internet Computing.