Is one hyperparameter optimizer enough?

Hyperparameter tuning is the black art of automatically finding a good combination of control parameters for a data miner. While widely applied in empirical Software Engineering, there has not been much discussion on which hyperparameter tuner is best for software analytics.To address this gap in the literature, this paper applied a range of hyperparameter optimizers (grid search, random search, differential evolution, and Bayesian optimization) to a defect prediction problem. Surprisingly, no hyperparameter optimizer was observed to be “best” and, for one of the two evaluation measures studied here (F-measure), hyperparameter optimization, in 50% of cases, was no better than using default configurations. We conclude that hyperparameter optimization is more nuanced than previously believed. While such optimization can certainly lead to large improvements in the performance of classifiers used in software analytics, it remains to be seen which specific optimizers should be applied to a new dataset.

[1]  Xiaoyan Zhu,et al.  Does bug prediction support human developers? Findings from a Google case study , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[2]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[3]  Mario Piattini,et al.  Green in Software Engineering , 2015, Springer International Publishing.

[4]  Tim Menzies,et al.  Simpler Transfer Learning (Using "Bellwethers") , 2017, ArXiv.

[5]  Di Chen,et al.  Applications of psychological science for actionable analytics , 2018, ESEC/SIGSOFT FSE.

[6]  Shane McIntosh,et al.  Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[7]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[8]  Tim Menzies,et al.  What is wrong with topic modeling? And how to fix it using search-based software engineering , 2016, Inf. Softw. Technol..

[9]  Tim Menzies,et al.  Easy over hard: a case study on deep learning , 2017, ESEC/SIGSOFT FSE.

[10]  Xin Yao,et al.  The impact of parameter tuning on software effort estimation using learning machines , 2013, PROMISE.

[11]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[12]  Jan K. Woike,et al.  FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees , 2017, Judgment and Decision Making.

[13]  Myra B. Cohen,et al.  Learning Combinatorial Interaction Test Generation Strategies Using Hyperheuristic Search , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[14]  Tim Menzies,et al.  On the value of user preferences in search-based software engineering: A case study in software product lines , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[15]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[16]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[17]  Tim Menzies,et al.  Heterogeneous Defect Prediction , 2015, IEEE Transactions on Software Engineering.

[18]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[21]  Akito Monden,et al.  Assessing the Cost Effectiveness of Fault Prediction in Acceptance Testing , 2013, IEEE Transactions on Software Engineering.

[22]  Tim Menzies,et al.  Too much automation? The bellwether effect and its implications for transfer learning , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[23]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[24]  Shane McIntosh,et al.  Automated Parameter Optimization of Classification Techniques for Defect Prediction Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[25]  Tim Menzies,et al.  Tuning for Software Analytics: is it Really Necessary? , 2016, Inf. Softw. Technol..

[26]  Thomas Ball,et al.  Static analysis tools as early indicators of pre-release defect density , 2005, ICSE.

[27]  Tim Menzies,et al.  Model-based tests of truisms , 2002, Proceedings 17th IEEE International Conference on Automated Software Engineering,.

[28]  Di Chen,et al.  Building Better Quality Predictors Using "$\epsilon$-Dominance" , 2018 .

[29]  Tim Menzies,et al.  Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors? , 2016, ArXiv.

[30]  Tim Menzies,et al.  "Better Data" is Better than "Better Data Miners" (Benefits of Tuning SMOTE for Defect Prediction) , 2017, ICSE.

[31]  Di Chen,et al.  Building Better Quality Predictors Using "ε-Dominance" , 2018, ArXiv.

[32]  Tim Menzies,et al.  RIOT: a Novel Stochastic Method for Rapidly Configuring Cloud-Based Workflows , 2017, ArXiv.

[33]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[34]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[35]  Xin Yao,et al.  An analysis of multi-objective evolutionary algorithms for training ensemble models based on different performance measures in software effort estimation , 2013, PROMISE.

[36]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[37]  N. Nagappan,et al.  Static analysis tools as early indicators of pre-release defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[38]  Tim Menzies,et al.  Is "Better Data" Better Than "Better Data Miners"? , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[39]  Sashank Dara,et al.  Online Defect Prediction for Imbalanced Data , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[40]  Tim Menzies,et al.  Bellwethers: A Baseline Method for Transfer Learning , 2017, IEEE Transactions on Software Engineering.

[41]  Tim Menzies,et al.  Can You Explain That, Better? Comprehensible Text Analytics for SE Applications , 2018, ArXiv.

[42]  Michael R. Lowry,et al.  Towards a theory for integration of mathematical verification and empirical testing , 1998, Proceedings 13th IEEE International Conference on Automated Software Engineering (Cat. No.98EX239).

[43]  Magne Jørgensen Realism in assessment of effort estimation uncertainty: it matters how you ask , 2004, IEEE Transactions on Software Engineering.

[44]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[45]  Emilia Mendes,et al.  How effective is Tabu search to configure support vector regression for effort estimation? , 2010, PROMISE '10.

[46]  Lefteris Angelis,et al.  Ranking and Clustering Software Cost Estimation Models through a Multiple Comparisons Algorithm , 2013, IEEE Transactions on Software Engineering.

[47]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[48]  Mark Harman,et al.  Searching for better configurations: a rigorous approach to clone evaluation , 2013, ESEC/FSE 2013.