Learning to sample: exploiting similarities across environments to learn performance models for configurable systems

Most software systems provide options that allow users to tailor the system in terms of functionality and qualities. The increased flexibility raises challenges for understanding the configuration space and the effects of options and their interactions on performance and other non-functional properties. To identify how options and interactions affect the performance of a system, several sampling and learning strategies have been recently proposed. However, existing approaches usually assume a fixed environment (hardware, workload, software release) such that learning has to be repeated once the environment changes. Repeating learning and measurement for each environment is expensive and often practically infeasible. Instead, we pursue a strategy that transfers knowledge across environments but sidesteps heavyweight and expensive transfer-learning strategies. Based on empirical insights about common relationships regarding (i) influential options, (ii) their interactions, and (iii) their performance distributions, our approach, L2S (Learning to Sample), selects better samples in the target environment based on information from the source environment. It progressively shrinks and adaptively concentrates on interesting regions of the configuration space. With both synthetic benchmarks and several real systems, we demonstrate that L2S outperforms state of the art performance learning and transfer-learning approaches in terms of measurement effort and learning accuracy.

[1]  Tim Menzies,et al.  Too much automation? The bellwether effect and its implications for transfer learning , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[2]  Bowei Xi,et al.  A smart hill-climbing algorithm for application server configuration , 2004, WWW '04.

[3]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[4]  R. R. Hocking The analysis and selection of variables in linear regression , 1976 .

[5]  Yu Lei,et al.  Introduction to Combinatorial Testing , 2013 .

[6]  Henry Hoffmann,et al.  Automated multi-objective control for self-adaptive software design , 2015, ESEC/SIGSOFT FSE.

[7]  Barbara Plank,et al.  Learning to select data for transfer learning with Bayesian Optimization , 2017, EMNLP.

[8]  Tao Ye,et al.  A recursive random search algorithm for large-scale network parameter configuration , 2003, SIGMETRICS '03.

[9]  Sam Malek,et al.  Ieee Transactions on Software Engineering 1 a Learning-based Framework for Engineering Feature-oriented Self-adaptive Software Systems , 2022 .

[10]  Geoffrey J. Gordon,et al.  Automatic Database Management System Tuning Through Large-scale Machine Learning , 2017, SIGMOD Conference.

[11]  Long Jin,et al.  Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software , 2015, ESEC/SIGSOFT FSE.

[12]  Christian Kästner,et al.  Sensitivity Analysis For Building Evolving & Adaptive Robotic Software , 2016 .

[13]  Ying Zou,et al.  An Industrial Case Study on the Automated Detection of Performance Regressions in Heterogeneous Environments , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[14]  Sven Apel,et al.  Using bad learners to find good configurations , 2017, ESEC/SIGSOFT FSE.

[15]  Peter Nobel,et al.  Practical performance models for complex, popular applications , 2010, SIGMETRICS '10.

[16]  Sven Apel,et al.  Performance-influence models for highly configurable systems , 2015, ESEC/SIGSOFT FSE.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Yves Le Traon,et al.  Combining Multi-Objective Search and Constraint Solving for Configuring Large Software Product Lines , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[19]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[20]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[21]  Wei Zheng,et al.  Automatic configuration of internet services , 2007, EuroSys '07.

[22]  Fan Wu,et al.  Deep Parameter Optimisation , 2015, GECCO.

[23]  Norbert Siegmund,et al.  Transfer learning for performance modeling of configurable systems: An exploratory analysis , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[24]  Sven Apel,et al.  Variability-aware performance prediction: A statistical learning approach , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[25]  Henry Hoffmann,et al.  Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.

[26]  Holger H. Hoos,et al.  Programming by optimization , 2012, Commun. ACM.

[27]  Sam Malek,et al.  FUSION: a framework for engineering self-tuning self-adaptive software systems , 2010, FSE '10.

[28]  Alexandre Bergel,et al.  Performance evolution blueprint: Understanding the impact of software evolution on performance , 2013, 2013 First IEEE Working Conference on Software Visualization (VISSOFT).

[29]  Yi Zhang,et al.  Performance Prediction of Configurable Software Systems by Fourier Learning (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[30]  Christopher Stewart,et al.  EntomoModel: Understanding and Avoiding Performance Anomaly Manifestations , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[31]  Sven Apel,et al.  Cost-Efficient Sampling for Performance Prediction of Configurable Systems (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[32]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[33]  Giuliano Casale,et al.  An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing Systems , 2016, 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS).

[34]  Ellen B. Roecker,et al.  Prediction error and its estimation for subset-selected models , 1991 .

[35]  Derek Rayside,et al.  Comparison of exact and approximate multi-objective optimization for software product lines , 2014, SPLC.

[36]  Philipp Leitner,et al.  Patterns in the Chaos—A Study of Performance Variation and Predictability in Public IaaS Clouds , 2014, ACM Trans. Internet Techn..

[37]  Krzysztof Czarnecki,et al.  Transferring Performance Prediction Models Across Different Hardware Platforms , 2017, ICPE.

[38]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[39]  Sven Apel,et al.  Performance Prediction of Multigrid-Solver Configurations , 2016, Software for Exascale Computing.

[40]  Marius Thomas Lindauer,et al.  Efficient Parameter Importance Analysis via Ablation with Surrogates , 2017, AAAI.

[41]  Tim Menzies,et al.  Transfer learning in effort estimation , 2015, Empirical Software Engineering.

[42]  Sven Apel,et al.  Faster discovery of faster system configurations with spectral learning , 2017, Automated Software Engineering.

[43]  Stefan Sobernig,et al.  Attributed variability models: outside the comfort zone , 2017, ESEC/SIGSOFT FSE.

[44]  Dick H. J. Epema,et al.  Towards Machine Learning-Based Auto-tuning of MapReduce , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.

[45]  D. Batory,et al.  Finding Product Line Configurations with High Performance by Random Sampling , 2017 .

[46]  Takayuki Osogami,et al.  Optimizing system configurations quickly by guessing at the performance , 2007, SIGMETRICS '07.

[47]  Mohammad Ghafari,et al.  A Framework for Classifying and Comparing Architecture-centric Software Evolution Research , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[48]  Don S. Batory,et al.  Finding near-optimal configurations in product lines by random sampling , 2017, ESEC/SIGSOFT FSE.

[49]  Holger H. Hoos,et al.  Automatically Configuring Algorithms for Scaling Performance , 2012, LION.

[50]  Christian Kästner,et al.  Transfer Learning for Improving Model Predictions in Highly Configurable Software , 2017, 2017 IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS).

[51]  Holger H. Hoos,et al.  Automated Algorithm Configuration and Parameter Tuning , 2012, Autonomous Search.

[52]  Michael F. P. O'Boyle,et al.  Integrating algorithmic parameters into benchmarking and design space exploration in 3D scene understanding , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[53]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[54]  Thomas G. Dietterich,et al.  To transfer or not to transfer , 2005, NIPS 2005.

[55]  Jeff G. Schneider,et al.  Active Transfer Learning under Model Shift , 2014, ICML.

[56]  Gunter Saake,et al.  Predicting performance via automated feature-interaction detection , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[57]  Lieven Eeckhout,et al.  Performance prediction based on inherent program similarity , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[58]  Sven Apel,et al.  Views on Internal and External Validity in Empirical Software Engineering , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[59]  Haifeng Chen,et al.  Experience Transfer for the Configuration Tuning in Large-Scale Computing Systems , 2009, IEEE Transactions on Knowledge and Data Engineering.

[60]  Harald C. Gall,et al.  The making of cloud applications: an empirical study on software development for the cloud , 2014, ESEC/SIGSOFT FSE.

[61]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[62]  Tim Menzies,et al.  Heterogeneous Defect Prediction , 2015, IEEE Transactions on Software Engineering.

[63]  Dorina C. Petriu,et al.  The Future of Software Performance Engineering , 2007, Future of Software Engineering (FOSE '07).

[64]  Mor Harchol-Balter,et al.  Performance Modeling and Design of Computer Systems: Queueing Theory in Action , 2013 .

[65]  Alexandr Murashkin,et al.  Visualization and exploration of optimal variants in product line engineering , 2013, SPLC '13.

[66]  Olaf Zimmermann,et al.  Architectural Principles for Cloud Software , 2018, ACM Trans. Internet Techn..

[67]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[68]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[69]  Wilhelm Hasselbring,et al.  Performance-oriented DevOps: A Research Agenda , 2015, ArXiv.