A2Cloud‐RF: A random forest based statistical framework to guide resource selection for high‐performance scientific computing on the cloud

This article proposes a random‐forest based A2Cloud framework to match scientific applications with Cloud providers and their instances for high performance. The framework leverages four engines for this task: PERF engine, Cloud trace engine, A2Cloud‐ext engine, and the random forest classifier (RFC) engine. The PERF engine profiles the application to obtain performance characteristics, including the number of single‐precision (SP) floating‐point operations (FLOPs), double‐precision (DP) FLOPs, x87 operations, memory accesses, and disk accesses. The Cloud trace engine obtains the corresponding performance characteristics of the selected Cloud instances including: SP floating point operations per second (FLOPS), DP FLOPS, x87 operations per second, memory bandwidth, and disk bandwidth. The A2Cloud‐ext engine uses the application and Cloud instance characteristics to generate objective scores that represent the application‐to‐Cloud match. The RFC engine uses these objective scores to generate two types of random forests to assist users with rapid analysis: application‐specific random forests (ARF) and application‐class based random forests. The ARF consider only the input application's characteristics to generate a random forest and provide numerical ratings to the selected Cloud instances. To generate the application‐class based random forests, the RFC engine downloads the application profiles and scores of previously tested applications that perform similar to the input application. Using these data, the RFC engine creates a random forest for instance recommendation. We exhaustively test this framework using eight real‐world applications across 12 instances from different Cloud providers. Our tests show significant statistical agreement between the instance ratings given by the framework and the ratings obtained via actual Cloud executions.

[1]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[2]  John M Herbert,et al.  A generalized many-body expansion and a unified view of fragment-based methods in electronic structure theory. , 2012, The Journal of chemical physics.

[3]  Mohsen Naderpour,et al.  Decision making on adoption of cloud computing in e-commerce using fuzzy TOPSIS , 2017, 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[4]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[5]  Gordon S. Blair,et al.  Daleel: Simplifying cloud instance selection using machine learning , 2016, NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium.

[6]  D. Kleinbaum,et al.  Applied Regression Analysis and Multivariable Methods , 1999 .

[7]  Chadi Kari,et al.  A2Cloud: An Analytical Model for Application-to-Cloud Matching to Empower Scientific Computing , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[8]  Alexandru Iosup,et al.  On the Performance Variability of Production Cloud Services , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[9]  H. Wilson Simplified dynamics of human and mammalian neocortical neurons. , 1999, Journal of theoretical biology.

[10]  Stockton,et al.  Excitonic Coupled-cluster Theory , 2017, 1709.01966.

[11]  Ryan P A Bettens,et al.  Energy-Based Molecular Fragmentation Methods. , 2015, Chemical reviews.

[12]  Vivek K. Pallipuram,et al.  Acceleration of spiking neural networks in emerging multi-core and GPU architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[13]  Philippe Olivier Alexandre Navaux,et al.  HPC Application Performance and Cost Efficiency in the Cloud , 2017, 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).

[14]  Gregory R. Ganger,et al.  On the diversity of cluster workloads and its impact on research results , 2018, USENIX Annual Technical Conference.

[15]  Michael Mikolajczak,et al.  Designing And Building Parallel Programs: Concepts And Tools For Parallel Software Engineering , 1997, IEEE Concurrency.

[16]  D. Cremer,et al.  Can density functional theory describe multi-reference systems? Investigation of carbenes and organic biradicals , 2000 .

[17]  Daniel Grosu,et al.  Efficient Bidding for Virtual Machine Instances in Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[18]  Yong Meng Teo,et al.  CELIA: Cost-Time Performance of Elastic Applications on Cloud , 2017, 2017 46th International Conference on Parallel Processing (ICPP).

[19]  Hannes Jónsson,et al.  Simulation of surface processes , 2011, Proceedings of the National Academy of Sciences.

[20]  Vivek K. Pallipuram,et al.  A Testing Engine for High-Performance and Cost-Effective Workflow Execution in the Cloud , 2015, 2015 44th International Conference on Parallel Processing.

[21]  Masaharu Munetomo,et al.  Optimal Cloud Resource Selection Method Considering Hard and Soft Constraints and Multiple Conflicting Objectives , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[22]  Weitao Ha,et al.  Cloud Service Selection with Fuzzy C-Means Artificial Immune Network Memory Classifier , 2018, 2018 14th International Conference on Computational Intelligence and Security (CIS).

[23]  A. Dutoi,et al.  Excitonically renormalised coupled-cluster theory , 2018, Molecular Physics.

[24]  Krishnan Raghavachari,et al.  Accurate Composite and Fragment-Based Quantum Chemical Models for Large Molecules. , 2015, Chemical reviews.

[25]  Gregory Triplett,et al.  Development of a Multi-Objective Evolutionary Algorithm for Strain-Enhanced Quantum Cascade Lasers , 2016 .

[26]  Rami Bahsoon,et al.  Cloud Instance Selection Using Parallel K-Means and AHP , 2019, UCC Companion.

[27]  Ralf Steinmetz,et al.  Setting Priorities - A Heuristic Approach for Cloud Data Center Selection , 2015, CLOSER.

[28]  Pascal Bouvry,et al.  Self-Regulated Multi-criteria Decision Analysis: An Autonomous Brokerage-Based Approach for Service Provider Ranking in the Cloud , 2017, 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom).

[29]  R. Dreizler,et al.  Density-Functional Theory , 1990 .

[30]  Edson Borin,et al.  Selecting Efficient Cloud Resources for HPC Workloads , 2019, UCC.

[31]  Emily A Carter,et al.  Advances in correlated electronic structure methods for solids, surfaces, and nanostructures. , 2008, Annual review of physical chemistry.

[32]  Georg Ofenbeck,et al.  Applying the roofline model , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[33]  Spencer R Pruitt,et al.  Fragmentation methods: a route to accurate calculations on large systems. , 2012, Chemical reviews.

[34]  Qinru Qiu,et al.  A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[35]  Vivek K. Pallipuram,et al.  A best-features based digital rotoscope , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[36]  Gary Roberts,et al.  Data migration algorithms in heterogeneous storage systems: A comparative performance evaluation , 2017, 2017 IEEE 16th International Symposium on Network Computing and Applications (NCA).

[37]  Sébastien Page Risk Parity Fundamentals , 2016 .

[38]  Michiaki Hayashi,et al.  Online Algorithms for Cost-Effective Cloud Selection with Multiple Demands , 2018, 2018 30th International Teletraffic Congress (ITC 30).

[39]  Geoffrey C. Fox,et al.  Using Clouds for Technical Computing , 2012, High Performance Computing Workshop.

[40]  Eugene M. Izhikevich,et al.  Simple model of spiking neurons , 2003, IEEE Trans. Neural Networks.

[41]  Santoso Wibowo,et al.  Performance evaluation of cloud computing providers using fuzzy multiattribute group decision making model , 2015, 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[42]  Ioan Raicu,et al.  Understanding the Performance and Potential of Cloud Computing for Scientific Applications , 2017, IEEE Transactions on Cloud Computing.

[43]  Bryan Ng,et al.  Cost-Aware Cloud Profiling, Prediction, and Provisioning as a Service , 2017, IEEE Cloud Computing.

[44]  Jean-Marc Menaud,et al.  Cloud Workload Prediction and Generation Models , 2017, 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[45]  Tomasz A Wesolowski,et al.  Frozen-Density Embedding Strategy for Multilevel Simulations of Electronic Structure. , 2015, Chemical reviews.

[46]  Ian Karlin,et al.  LULESH Programming Model and Performance Ports Overview , 2012 .

[47]  Yelena Yesha,et al.  Cloud big data decision support system for machine learning on AWS: Analytics of analytics , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[48]  Jie Xu,et al.  ML-NA: A Machine Learning Based Node Performance Analyzer Utilizing Straggler Statistics , 2017, 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS).

[49]  Jack J. Dongarra,et al.  The LINPACK Benchmark: An Explanation , 1988, ICS.

[50]  Robert Latham,et al.  Understanding and improving computational science storage access through continuous characterization , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[51]  Anna I. Krylov,et al.  The spin–flip approach within time-dependent density functional theory: Theory and applications to diradicals , 2003 .

[52]  Jacek A. Majewski,et al.  Modeling of Semiconductor Nanostructures with nextnano 3 , 2006 .

[53]  A. Hodgkin,et al.  A quantitative description of membrane current and its application to conduction and excitation in nerve , 1952, The Journal of physiology.

[54]  A. Dutoi,et al.  Systematically improvable excitonic Hamiltonians for electronic structure theory , 2017, Molecular Physics.

[55]  Thomas Hérault,et al.  Unified model for assessing checkpointing protocols at extreme‐scale , 2014, Concurr. Comput. Pract. Exp..

[56]  Yanjun Qi,et al.  Comprehensive Elastic Resource Management to Ensure Predictable Performance for Scientific Applications on Public IaaS Clouds , 2014, 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing.