Arrow: Low-Level Augmented Bayesian Optimization for Finding the Best Cloud VM

With the advent of big data applications, which tend to have longer execution time, choosing the right cloud VM has significant performance and economic implications. For example, in our large-scale empirical study of 107 different workloads on three popular big data systems, we found that a wrong choice can lead to a 20 times slowdown or an increase in cost by 10 times. Bayesian optimization is a technique for optimizing expensive (black-box) functions. Previous work has only used instance-level information (such as core counts and memory size) which is not sufficient to represent the search space. In this work, we discover that this may lead to the fragility problem–either incurs high search cost or finds only the sub-optimal solution. The central insight of this paper is to use low-level performance information to augment the process of Bayesian Optimization. Our novel low-level augmented Bayesian Optimization is rarely worse than current practices and often performs much better (in 46 of 107 cases). Further, it significantly reduces the search cost in nearly half of our case studies. Based on this work, we conclude that it is often insufficient to use general-purpose off-the-shelf methods for configuring cloud instances without augmenting those methods with essential systems knowledge such as CPU utilization, working memory size and I/O wait time.

[1]  Armando Fox,et al.  Fingerprinting the datacenter: automated classification of performance crises , 2010, EuroSys '10.

[2]  Wilhelm Hasselbring,et al.  Search-based genetic optimization for deployment and reconfiguration of software in the cloud , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[3]  Feng Pan,et al.  Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications , 2007, IEEE Transactions on Parallel and Distributed Systems.

[4]  Randy H. Katz,et al.  An Empirical Exploration of Black-Box Performance Models for Storage Systems , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[5]  Sven Apel,et al.  Using bad learners to find good configurations , 2017, ESEC/SIGSOFT FSE.

[6]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[7]  Ben Y. Zhao,et al.  Complexity vs. performance: empirical analysis of machine learning as a service , 2017, Internet Measurement Conference.

[8]  Don S. Batory,et al.  Finding near-optimal configurations in product lines by random sampling , 2017, ESEC/SIGSOFT FSE.

[9]  Giuliano Casale,et al.  An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing Systems , 2016, 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS).

[10]  Kay Ousterhout,et al.  Monotasks: Architecting for Performance Clarity in Data Analytics Frameworks , 2017, SOSP.

[11]  M. Zuluaga,et al.  ε-PAL: an active learning approach to the multi-objective optimization problem , 2016 .

[12]  I. Sobol,et al.  On quasi-Monte Carlo integrations , 1998 .

[13]  Randy H. Katz,et al.  Selecting the best VM across multiple public clouds: a data-driven performance modeling approach , 2017, SoCC.

[14]  Ion Stoica,et al.  Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[15]  Valentin Dalibard,et al.  BOAT: Building Auto-Tuners with Structured Bayesian Optimization , 2017, WWW.

[16]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[17]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[18]  E. Anderson HPL – SSP – 2001 – 4 : Simple table-based modeling of storage devices , 2001 .

[19]  Moo-Ryong Ra,et al.  Inside-Out: Reliable Performance Prediction for Distributed Storage Systems in the Cloud , 2016, 2016 IEEE 35th Symposium on Reliable Distributed Systems (SRDS).

[20]  Samuel Kounev,et al.  Predictive performance modeling of virtualized storage systems using optimized statistical regression techniques , 2013, ICPE '13.

[21]  Tim Menzies,et al.  FLASH: A Faster Optimizer for SBSE Tasks , 2017, ArXiv.

[22]  Marco Canini,et al.  Towards automatic parameter tuning of stream processing systems , 2017, SoCC.

[23]  Sven Apel,et al.  Faster discovery of faster system configurations with spectral learning , 2017, Automated Software Engineering.

[24]  Zi Wang,et al.  Max-value Entropy Search for Efficient Bayesian Optimization , 2017, ICML.

[25]  Minlan Yu,et al.  CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.

[26]  Ricardo Bianchini,et al.  DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments , 2013, USENIX Annual Technical Conference.

[27]  Christos Faloutsos,et al.  Storage device performance prediction with CART models , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[28]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[29]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[30]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.