Selecting Efficient Cloud Resources for HPC Workloads

Constant advances in CPU, storage, and network virtualization are enabling high-performance computing (HPC) applications to be efficiently executed on cloud computing systems. In this computing model, users pay only for what they use, with no need to acquire nor maintain expensive computing infrastructure. Moreover, users have at their disposal multiple kinds of computing resources and are able to assemble computing infrastructures that fit the application needs. Nonetheless, the available computing resources vary in price and performance and selecting the proper resources to execute the applications is of utmost importance to optimize cost and performance. In this work, we discuss the performance and cost implications of selecting different kinds of cloud resources to execute HPC workloads and show that the best resources for executing a given application depend not only on the application itself but also on the input dataset being processed. We also propose a methodology to support the selection of efficient cloud resources for these applications and show that is was able to select the best of 11 different cloud infrastructure configurations to execute 8 different benchmarks by executing just a few seconds of each application on each one of the configurations.

[1]  G. Broll,et al.  Microsoft Corporation , 1999 .

[2]  Ludovic Métivier,et al.  The SEISCOPE optimization toolbox: A large-scale nonlinear optimization library based on reverse communication , 2016 .

[3]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[4]  Dejan S. Milojicic,et al.  Evaluating and Improving the Performance and Scheduling of HPC Applications in Cloud , 2016, IEEE Transactions on Cloud Computing.

[5]  Martin Tygel,et al.  A heuristic to optimize the execution cost of distributed seismic processing programs on the cloud , 2019 .

[6]  Anthony A. Maciejewski,et al.  An Analysis of Multilevel Checkpoint Performance Models , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[7]  Christophe Lefèvre,et al.  Exposing HPC and sequential applications as services through the development and deployment of a SaaS cloud , 2015, Future Gener. Comput. Syst..

[8]  Frank Mueller,et al.  Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[9]  Martin Tygel,et al.  Optimizing the Execution Costs of High-Performance Geophysics Software on the Cloud , 2019 .

[10]  Rizos Sakellariou,et al.  A Performance Model to Estimate Execution Time of Scientific Workflows on the Cloud , 2014, 2014 9th Workshop on Workflows in Support of Large-Scale Science.

[11]  Weizhe Zhang,et al.  Predicting HPC parallel program performance based on LLVM compiler , 2016, Cluster Computing.

[12]  Wenguang Chen,et al.  PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node , 2010, PPoPP '10.

[13]  Gero Dittmann,et al.  Predicting cloud performance for HPC applications before deployment , 2017, Future Gener. Comput. Syst..

[14]  Marco Aurélio Stelmar Netto,et al.  Deciding When and How to Move HPC Jobs to the Cloud , 2015, Computer.

[15]  Rajendra V. Boppana,et al.  Performance Prediction of Parallel Applications Based on Small-Scale Executions , 2016, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC).

[16]  Sathish S. Vadhiyar,et al.  Matching Application Signatures for Performance Predictions Using a Single Execution , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[17]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[18]  Torsten Hoefler,et al.  PEMOGEN: Automatic adaptive performance modeling during program runtime , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[19]  P. Mell,et al.  SP 800-145. The NIST Definition of Cloud Computing , 2011 .

[20]  Qiang Xu,et al.  Performance prediction with skeletons , 2008, Cluster Computing.

[21]  Nicholas Okita,et al.  Using SPITS to Optimize the Cost of High-Performance Geophysics Processing on the Cloud , 2018 .