Scout: An Experienced Guide to Find the Best Cloud Configuration

Finding the right cloud configuration for workloads is an essential step to ensure good performance and contain running costs. A poor choice of cloud configuration decreases application performance and increases running cost significantly. While Bayesian Optimization is effective and applicable to any workloads, it is fragile because performance and workload are hard to model (to predict). In this paper, we propose a novel method, SCOUT. The central insight of SCOUT is that using prior measurements, even those for different workloads, improves search performance and reduces search cost. At its core, SCOUT extracts search hints (inference of resource requirements) from low-level performance metrics. Such hints enable SCOUT to navigate through the search space more efficiently---only spotlight region will be searched. We evaluate SCOUT with 107 workloads on Apache Hadoop and Spark. The experimental results demonstrate that our approach finds better cloud configurations with a lower search cost than state of the art methods. Based on this work, we conclude that (i) low-level performance information is necessary for finding the right cloud configuration in an effective, efficient and reliable way, and (ii) a search method can be guided by historical data, thereby reducing cost and improving performance.

[1]  Nebojsa Jojic,et al.  Efficient Ranking from Pairwise Comparisons , 2013, ICML.

[2]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[3]  D. Sculley,et al.  Vizier : A Service for Black-Box Optimization , 2017 .

[4]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[5]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[6]  Sven Apel,et al.  Using bad learners to find good configurations , 2017, ESEC/SIGSOFT FSE.

[7]  Yuqing Zhu,et al.  BestConfig: tapping the performance potential of systems via automatic configuration tuning , 2017, SoCC.

[8]  Sven Apel,et al.  Finding Faster Configurations Using FLASH , 2018, IEEE Transactions on Software Engineering.

[9]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[10]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.

[11]  Randy H. Katz,et al.  Selecting the best VM across multiple public clouds: a data-driven performance modeling approach , 2017, SoCC.

[12]  Don S. Batory,et al.  Finding near-optimal configurations in product lines by random sampling , 2017, ESEC/SIGSOFT FSE.

[13]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[14]  Feng Pan,et al.  Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications , 2007, IEEE Transactions on Parallel and Distributed Systems.

[15]  Minlan Yu,et al.  CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.

[16]  Ricardo Bianchini,et al.  DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments , 2013, USENIX Annual Technical Conference.

[17]  Tim Menzies,et al.  Arrow: Low-Level Augmented Bayesian Optimization for Finding the Best Cloud VM , 2017, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[18]  Lucas Layman,et al.  LACE2: Better Privacy-Preserving Data Sharing for Cross Project Defect Prediction , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[19]  Marco Canini,et al.  Towards automatic parameter tuning of stream processing systems , 2017, SoCC.

[20]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[21]  Moo-Ryong Ra,et al.  Inside-Out: Reliable Performance Prediction for Distributed Storage Systems in the Cloud , 2016, 2016 IEEE 35th Symposium on Reliable Distributed Systems (SRDS).

[22]  Lingjia Tang,et al.  The impact of memory subsystem resource sharing on datacenter applications , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[23]  Armando Fox,et al.  Fingerprinting the datacenter: automated classification of performance crises , 2010, EuroSys '10.

[24]  Aaron Klein,et al.  Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[25]  Valentin Dalibard,et al.  BOAT: Building Auto-Tuners with Structured Bayesian Optimization , 2017, WWW.