Predicting cloud performance for HPC applications before deployment

Abstract To reduce the capital investment required to acquire and maintain a high performance computing cluster, today many HPC users are moving to cloud. When deploying an application in the cloud, the users may ( a ) fail to understand the interactions of the application with the software layers implementing the cloud system, ( b ) be unaware of some hardware details of the cloud system, and ( c ) fail to understand how sharing part of the cloud system with other users might degrade application performance. These misunderstandings may lead the users to select suboptimal cloud configurations in terms of cost or performance. In this work we propose a machine-learning methodology to support the user in the selection of the best cloud configuration to run the target workload before deploying it in the cloud. This enables the user to decide if and what to buy before facing the cost of porting and analyzing the application in the cloud. We couple a cloud-performance-prediction model (CP) on the cloud-provider side with a hardware-independent profile-prediction model (PP) on the user-side. PP captures the application-specific scaling behavior. The user profiles the target application while processing small datasets on small machines she (or he) owns, and applies machine learning to generate PP to predict the profiles for larger datasets to be processed in the cloud. CP is generated by the cloud provider to learn the relationships between the hardware-independent profile and cloud performance starting from the observations gathered by executing a set of training applications on a set of training cloud configurations. Since the profile data in use is hardware-independent the user and the provider can generate the prediction models independently possibly on heterogeneous machines. We apply the prediction models to Fortran-MPI benchmarks. The resulting relative error is below 12% for CP and 30% for PP. The optimal Pareto front of cloud configurations finally found when maximizing performance and minimizing execution cost on the prediction models is at most 25% away from the actual optimal solutions.

[1]  Cristiano André da Costa,et al.  Joint‐analysis of performance and energy consumption when enabling cloud elasticity for synchronous HPC applications , 2016, Concurr. Comput. Pract. Exp..

[2]  David Vengerov,et al.  Comprehensive multivariate extrapolation modeling of multiprocessor cache miss rates , 2007, TOCS.

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  Tianshi Chen,et al.  Microarchitectural design space exploration made fast , 2013, Microprocess. Microsystems.

[5]  Stephen P. Crago,et al.  Bridging the Virtualization Performance Gap for HPC Using SR-IOV for InfiniBand , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[6]  Gero Dittmann,et al.  Analytic processor model for fast design-space exploration , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[7]  Franz Franchetti,et al.  Accelerating Architectural Simulation Via Statistical Techniques: A Survey , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Philippe Olivier Alexandre Navaux,et al.  High Performance Computing in the cloud: Deployment, performance and cost efficiency , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[9]  Dejan S. Milojicic,et al.  Evaluating and Improving the Performance and Scheduling of HPC Applications in Cloud , 2016, IEEE Transactions on Cloud Computing.

[10]  Ewa Deelman,et al.  The cost of doing science on the cloud: the Montage example , 2008, HiPC 2008.

[11]  Gero Dittmann,et al.  Scaling application properties to exascale , 2015, Conf. Computing Frontiers.

[12]  Gero Dittmann,et al.  Predicting Cloud Performance for HPC Applications: A User-Oriented Approach , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[13]  Antony I. T. Rowstron,et al.  Bridging the tenant-provider gap in cloud services , 2012, SoCC '12.

[14]  Gero Dittmann,et al.  Classification of thread profiles for scaling application behavior , 2017, Parallel Comput..

[15]  Dejan S. Milojicic,et al.  The Who, What, Why, and How of High Performance Computing in the Cloud , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[16]  Sri Parameswaran,et al.  Fidelity metrics for estimation models , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[17]  Marco Laumanns,et al.  Why Quality Assessment Of Multiobjective Optimizers Is Difficult , 2002, GECCO.

[18]  Philipp Leitner,et al.  An Approach and Case Study of Cloud Instance Type Selection for Multi-tier Web Applications , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[19]  Thu D. Nguyen,et al.  Reducing electricity cost through virtual machine placement in high performance computing clouds , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[20]  Xuejie Zhang,et al.  Performance Evaluation Based on Open Source Cloud Platforms for High Performance Computing , 2013, 2013 6th International Conference on Intelligent Networks and Intelligent Systems.

[21]  Domenico Cotroneo,et al.  To Cloudify or Not to Cloudify: The Question for a Scientific Data Center , 2016, IEEE Transactions on Cloud Computing.

[22]  Emilio Luque,et al.  Parallel Application Signature for Performance Analysis and Prediction , 2015, IEEE Transactions on Parallel and Distributed Systems.

[23]  Fernando De la Torre,et al.  Optimal feature selection for support vector machines , 2010, Pattern Recognit..

[24]  Ke Wang,et al.  Achieving Efficient Distributed Scheduling with Message Queues in the Cloud for Many-Task Computing and High-Performance Computing , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[25]  Gero Dittmann,et al.  An Instrumentation Approach for Hardware-Agnostic Software Characterization , 2015, International Journal of Parallel Programming.

[26]  Lieven Eeckhout,et al.  Microarchitecture-Independent Workload Characterization , 2007, IEEE Micro.

[27]  Matthias S. Müller,et al.  SPEC MPI2007—an application benchmark suite for parallel systems using MPI , 2010, ISC 2010.

[28]  Sally A. McKee,et al.  Characterizing and subsetting big data workloads , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[29]  Zhanpeng Jin,et al.  Improve simulation efficiency using statistical benchmark subsetting - An implantbench case study , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[30]  Edwin Hsing-Mean Sha,et al.  Exploit asymmetric error rates of cell states to improve the performance of flash memory storage systems , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[31]  Vittorio Zaccaria,et al.  ReSPIR: A Response Surface-Based Pareto Iterative Refinement for Application-Specific Design Space Exploration , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[32]  Gero Dittmann,et al.  Scaling Properties of Parallel Applications to Exascale , 2016, International Journal of Parallel Programming.

[33]  Kevin Leyton-Brown,et al.  Algorithm runtime prediction: Methods & evaluation , 2012, Artif. Intell..

[34]  Vittorio Zaccaria,et al.  OSCAR: An Optimization Methodology Exploiting Spatial Correlation in Multicore Design Spaces , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[35]  Amin Vahdat,et al.  Achieving cost-efficient, data-intensive computing in the cloud , 2015, SoCC.

[36]  Dejan S. Milojicic,et al.  Improving HPC Application Performance in Cloud through Dynamic Load Balancing , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[37]  Torsten Hoefler,et al.  Using automated performance modeling to find scalability bugs in complex codes , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[38]  Lizy Kurian John,et al.  Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite , 2007, ISCA '07.