CAP3: A Cloud Auto-Provisioning Framework for Parallel Processing Using On-Demand and Spot Instances

Cloud computing has drawn increasing attention from the scientific computing community due to its ease of use, elasticity, and relatively low cost. Because a high-performance computing (HPC) application is usually resource demanding, without careful planning, it can incur a high monetary expense even in Cloud. We design a tool called CAP3 (Cloud Auto-Provisioning framework for Parallel Processing) to help a user minimize the expense of running an HPC application in Cloud, while meeting the user-specified job deadline. Given an HPC application, CAP3 automatically profiles the application, builds a model to predict its performance, and infers a proper cluster size that can finish the job within its deadline while minimizing the total cost. To further reduce the cost, CAP3 intelligently chooses the Cloud's reliable on-demand instances or low-cost spot instances, depending on whether the remaining time is tight in meeting the application's deadline. Experiments on Amazon EC2 show that the execution strategy given by CAP3 is cost-effective, by choosing a proper cluster size and a proper instance type (on-demand or spot).

[1]  Rj Allan,et al.  Survey of HPC Performance Modelling and Prediction Tools , 2009 .

[2]  Jesús Labarta,et al.  Validation of Dimemas Communication Model for MPI Collective Operations , 2000, PVM/MPI.

[3]  Asser N. Tantawi,et al.  See Spot Run: Using Spot Instances for MapReduce Workflows , 2010, HotCloud.

[4]  Rajkumar Buyya,et al.  Reliable Provisioning of Spot Instances for Compute-intensive Applications , 2011, 2012 IEEE 26th International Conference on Advanced Information Networking and Applications.

[5]  Martin Schulz,et al.  A regression-based approach to scalability prediction , 2008, ICS '08.

[6]  Toni Cortes,et al.  PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .

[7]  Rajkumar Buyya,et al.  Provisioning Spot Market Cloud Resources to Create Cost-Effective Virtual Clusters , 2011, ICA3PP.

[8]  Martin Schulz,et al.  Using focused regression for accurate time-constrained scaling of scientific applications , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[9]  Matthias S. Müller,et al.  The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.

[10]  Himabindu Pucha,et al.  Towards Optimizing Hadoop Provisioning in the Cloud , 2009, HotCloud.

[11]  Stephen A. Jarvis,et al.  WARPP: a toolkit for simulating high-performance parallel scientific codes , 2009, SimuTools.

[12]  Rajkumar Buyya,et al.  Managing Peak Loads by Leasing Cloud Infrastructure Services from a Spot Market , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).

[13]  Kees Verstoep,et al.  Fast Measurement of LogP Parameters for Message Passing Platforms , 2000, IPDPS Workshops.

[14]  Michael Laurenzano,et al.  PSINS: An Open Source Event Tracer and Execution Simulator , 2009, 2009 DoD High Performance Computing Modernization Program Users Group Conference.

[15]  Yang Song,et al.  Optimal Bids for Spot VMs in a Cloud for Deadline Constrained Jobs , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[16]  Mary K. Vernon,et al.  LoPC: modeling contention in parallel algorithms , 1997, PPOPP '97.

[17]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[18]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[19]  Shaojie Tang,et al.  Towards Optimal Bidding Strategy for Amazon EC2 Cloud Spot Instance , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[20]  Muli Ben-Yehuda,et al.  Deconstructing Amazon EC2 Spot Instance Pricing , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[21]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[22]  Jack J. Dongarra,et al.  Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[23]  Abdallah Khreishah,et al.  SpotMPI: A Framework for Auction-Based HPC Computing Using Amazon Spot Instances , 2011, ICA3PP.

[24]  Rupak Biswas,et al.  Performance evaluation of Amazon EC2 for NASA HPC applications , 2012, ScienceCloud '12.