Autonomic management of application workflows on hybrid computing infrastructure

In this paper, we present a programming and runtime framework that enables the autonomic management of complex application workflows on hybrid computing infrastructures. The framework is designed to address system and application heterogeneity and dynamics to ensure that application objectives and constraints are satisfied. The need for such autonomic system and application management is becoming critical as computing infrastructures become increasingly heterogeneous, integrating different classes of resources from high-end HPC systems to commodity clusters and clouds. For example, the framework presented in this paper can be used to provision the appropriate mix of resources based on application requirements and constraints. The framework also monitors the system/application state and adapts the application and/or resources to respond to changing requirements or environment. To demonstrate the operation of the framework and to evaluate its ability, we employ a workflow used to characterize an oil reservoir executing on a hybrid infrastructure composed of TeraGrid nodes and Amazon EC2 instances of various types. Specifically, we show how different applications objectives such as acceleration, conservation and resilience can be effectively achieved while satisfying deadline and budget constraints, using an appropriate mix of dynamically provisioned resources. Our evaluations also demonstrate that public clouds can be used to complement and reinforce the scheduling and usage of traditional high performance computing infrastructure.

[1]  Nicholas Carriero,et al.  Linda in context , 1989, CACM.

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[4]  Manish Parashar,et al.  Squid: Enabling search in DHT-based systems , 2008, J. Parallel Distributed Comput..

[5]  Liana L. Fong,et al.  Enabling Interoperability among Meta-Schedulers , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[6]  Thomas Fahringer,et al.  GLARE: A Grid Activity Registration, Deployment and Provisioning Framework , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[7]  Manish Parashar,et al.  Online Risk Analytics on the Cloud , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[8]  Eduardo Huedo,et al.  Dynamic Provision of Computing Resources from Grid Infrastructures and Cloud Providers , 2009, 2009 Workshops at the Grid and Pervasive Computing Conference.

[9]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[10]  Radu Prodan,et al.  Extending Grids with cloud resource management for scientific computing , 2009, 2009 10th IEEE/ACM International Conference on Grid Computing.

[11]  Zhou Lei,et al.  Reservoir model updating by Ensemble Kalman Filter - Practical approaches using grid computing technology , 2007 .

[12]  Lin Yang,et al.  Investigating the use of autonomic cloudbursts for high-throughput medical image registration , 2009, 2009 10th IEEE/ACM International Conference on Grid Computing.

[13]  Shantenu Jha,et al.  An Autonomic Approach to Integrated HPC Grid and Cloud Usage , 2009, 2009 Fifth IEEE International Conference on e-Science.

[14]  Zhiwei Xu,et al.  An Adaptive Scheduling Mechanism for Elastic Grid Computing , 2009, 2009 Fifth International Conference on Semantics, Knowledge and Grid.

[15]  Katarzyna Keahey,et al.  Flying Low: Simple Leases with Workspace Pilot , 2008, Euro-Par.

[16]  Li Chunlin,et al.  Cross-layer optimization policy for QoS scheduling in computational grid , 2008 .

[17]  Paul Marshall,et al.  Elastic Site: Using Clouds to Elastically Extend Site Resources , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[18]  Dean S. Oliver,et al.  The Ensemble Kalman Filter for Continuous Updating of Reservoir Simulation Models , 2006 .

[19]  Rajkumar Buyya,et al.  InterGrid: a case for internetworking islands of Grids , 2008 .

[20]  Andrew Wendelborn,et al.  Remote Interaction and scheduling aspects of cloud based streams , 2009, 2009 5th IEEE International Conference on E-Science Workshops.

[21]  Rajkumar Buyya,et al.  Evaluating the cost-benefit of using cloud computing to extend the capacity of clusters , 2009, HPDC '09.

[22]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[23]  David R. Karger,et al.  Chord: a scalable peer-to-peer lookup protocol for internet applications , 2003, TNET.

[24]  A.H. Ozer,et al.  An auction based mathematical model and heuristics for resource co-allocation problem in grids and clouds , 2009, 2009 Fifth International Conference on Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control.

[25]  Bernd Freisleben,et al.  On-Demand Resource Provisioning for BPEL Workflows Using Amazon's Elastic Compute Cloud , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[26]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[27]  Dean S. Oliver,et al.  An Iterative Ensemble Kalman Filter for Multiphase Fluid Flow Data Assimilation , 2007 .