BiLO-CPDP: Bi-Level Programming for Automated Model Discovery in Cross-Project Defect Prediction

Cross-Project Defect Prediction (CPDP), which borrows data from similar projects by combining a transfer learner with a classifier, have emerged as a promising way to predict software defects when the available data about the target project is insufficient. However, developing such a model is challenge because it is difficult to determine the right combination of transfer learner and classifier along with their optimal hyper-parameter settings. In this paper, we propose a tool, dubbed BiLO-CPDP, which is the first of its kind to formulate the automated CPDP model discovery from the perspective of bi-level programming. In particular, the bi-level programming proceeds the optimization with two nested levels in a hierarchical manner. Specifically, the upper-level optimization routine is designed to search for the right combination of transfer learner and classifier while the nested lower-level optimization routine aims to optimize the corresponding hyper-parameter settings. To evaluate BiLO-CPDP, we conduct experiments on 20 projects to compare it with a total of 21 existing CPDP techniques, along with its single-level optimization variant and Auto-Sklearn, a state-of-the-art automated machine learning tool. Empirical results show that BiLO-CPDP champions better prediction performance than all other 21 existing CPDP techniques on 70% of the projects, while being overwhelmingly superior to Auto-Sklearn and its single-level optimization variant on all cases. Furthermore, the unique bi-level formalization in BiLO-CPDP also permits to allocate more budget to the upper-level, which significantly boosts the performance.

[1]  Qingfu Zhang,et al.  Adaptive weights generation for decomposition-based multi-objective optimization using Gaussian process regression , 2017, GECCO.

[2]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[3]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[4]  Cong Zhou,et al.  A novel algorithm for non-dominated hypervolume-based multiobjective optimization , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[5]  Yuming Zhou,et al.  How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction , 2018, ACM Trans. Softw. Eng. Methodol..

[6]  Geyong Min,et al.  A Formal Model for Multi-objective Optimisation of Network Function Virtualisation Placement , 2019, EMO.

[7]  Sam Kwong,et al.  A general framework for evolutionary multiobjective optimization via manifold learning , 2014, Neurocomputing.

[8]  Xin Yao,et al.  Dynamic Multiobjectives Optimization With a Changing Number of Objectives , 2016, IEEE Transactions on Evolutionary Computation.

[9]  Sam Kwong,et al.  A weighted voting method using minimum square error based on Extreme Learning Machine , 2012, 2012 International Conference on Machine Learning and Cybernetics.

[10]  Kalyanmoy Deb,et al.  Code-Smell Detection as a Bilevel Problem , 2014, TSEM.

[11]  Hideaki Hata,et al.  Cross project defect prediction using class distribution estimation and oversampling , 2018, Inf. Softw. Technol..

[12]  Ke Li,et al.  Visualisation of Pareto Front Approximation: A Short Survey and Empirical Comparisons , 2019, 2019 IEEE Congress on Evolutionary Computation (CEC).

[13]  Qingfu Zhang,et al.  Interrelationship-Based Selection for Decomposition Multiobjective Optimization , 2015, IEEE Transactions on Cybernetics.

[14]  Ke Li,et al.  Progressive Preference Learning: Proof-of-Principle Results in MOEA/D , 2019, EMO.

[15]  Hongfang Liu,et al.  An investigation of the effect of module size on defect prediction using static measures , 2005, PROMISE@ICSE.

[16]  Premkumar T. Devanbu,et al.  Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[17]  Shengxiang Yang,et al.  A knee-point-based evolutionary algorithm using weighted subpopulation for many-objective optimization , 2019, Swarm Evol. Comput..

[18]  A. Vargha,et al.  A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .

[19]  Sam Kwong,et al.  Combining interpretable fuzzy rule-based classifiers via multi-objective hierarchical evolutionary algorithm , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[20]  Jens Grabowski,et al.  A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches , 2018, IEEE Transactions on Software Engineering.

[21]  Kalyanmoy Deb,et al.  A Review on Bilevel Optimization: From Classical to Evolutionary Approaches and Applications , 2017, IEEE Transactions on Evolutionary Computation.

[22]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[23]  Heinrich von Stackelberg Market Structure and Equilibrium , 2010 .

[24]  Yue Jiang,et al.  Can data transformation help in the detection of fault-prone modules? , 2008, DEFECTS '08.

[25]  Xin Yao,et al.  R-Metric: Evaluating the Performance of Preference-Based Evolutionary Multiobjective Optimization Using Reference Points , 2018, IEEE Transactions on Evolutionary Computation.

[26]  Xiao-Yuan Jing,et al.  Progress on approaches to software defect prediction , 2018, IET Softw..

[27]  Burak Turhan,et al.  A Systematic Literature Review and Meta-Analysis on Cross Project Defect Prediction , 2019, IEEE Transactions on Software Engineering.

[28]  Kim-Fung Man,et al.  Learning paradigm based on jumping genes: A general framework for enhancing exploration in evolutionary multiobjective optimization , 2013, Inf. Sci..

[29]  Kay Chen Tan,et al.  Understanding the Automated Parameter Optimization on Transfer Learning for CPDP: An Empirical Study , 2020, ArXiv.

[30]  Yingquan Zhao,et al.  Impact of Hyper Parameter Optimization for Cross-Project Software Defect Prediction , 2018 .

[31]  Qingfu Zhang,et al.  Adaptive Operator Selection With Bandits for a Multiobjective Evolutionary Algorithm Based on Decomposition , 2014, IEEE Transactions on Evolutionary Computation.

[32]  Muhammed Maruf Öztürk,et al.  The impact of parameter optimization of ensemble learning on defect prediction , 2019, Comput. Sci. J. Moldova.

[33]  Lefteris Angelis,et al.  Ranking and Clustering Software Cost Estimation Models through a Multiple Comparisons Algorithm , 2013, IEEE Transactions on Software Engineering.

[34]  Qingfu Zhang,et al.  Two-Level Stable Matching-Based Selection in MOEA/D , 2015, 2015 IEEE International Conference on Systems, Man, and Cybernetics.

[35]  Sam Kwong,et al.  Multi-objective differential evolution with self-navigation , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[36]  Kalyanmoy Deb,et al.  Model transformation testing: a bi‐level search‐based software engineering approach , 2015, J. Softw. Evol. Process..

[37]  Thilo Mende,et al.  Replication of defect prediction studies: problems, pitfalls and recommendations , 2010, PROMISE '10.

[38]  Qingfu Zhang,et al.  Learning to Decompose: A Paradigm for Decomposition-Based Multiobjective Optimization , 2019, IEEE Transactions on Evolutionary Computation.

[39]  Jongmoon Baik,et al.  Value-cognitive boosting with a support vector machine for cross-project defect prediction , 2014, Empirical Software Engineering.

[40]  Daoxu Chen,et al.  A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction , 2017, Journal of Computer Science and Technology.

[41]  Xin Yao,et al.  Interactive Decomposition Multiobjective Optimization Via Progressively Learned Value Functions , 2018, IEEE Transactions on Fuzzy Systems.

[42]  Lu Lu,et al.  Multiple-components weights model for cross-project software defect prediction , 2018, IET Softw..

[43]  Qingfu Zhang,et al.  Decomposition multi-objective optimisation: current developments and future opportunities , 2018, GECCO.

[44]  Cong Zhou,et al.  An Improved Differential Evolution for Multi-objective Optimization , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[45]  Rainer Koschke,et al.  Revisiting the evaluation of defect prediction models , 2009, PROMISE '09.

[46]  Sam Kwong,et al.  AN indicator-based selection multi-objective evolutionary algorithm with preference for multi-class ensemble , 2014, 2014 International Conference on Machine Learning and Cybernetics.

[47]  Di Chen,et al.  How to “DODGE” Complex Software Analytics , 2019, IEEE Transactions on Software Engineering.

[48]  Aaron Klein,et al.  Hyperparameter Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[49]  Tim Menzies,et al.  "Better Data" is Better than "Better Data Miners" (Benefits of Tuning SMOTE for Defect Prediction) , 2017, ICSE.

[50]  Álvaro Fialho,et al.  Multi-Objective Differential Evolution with Adaptive Control of Parameters and Operators , 2011, LION.

[51]  Qingfu Zhang,et al.  An Evolutionary Many-Objective Optimization Algorithm Based on Dominance and Decomposition , 2015, IEEE Transactions on Evolutionary Computation.

[52]  Qingfu Zhang,et al.  Matching-Based Selection With Incomplete Lists for Decomposition Multiobjective Optimization , 2016, IEEE Transactions on Evolutionary Computation.

[53]  Qingfu Zhang,et al.  Efficient Nondomination Level Update Method for Steady-State Evolutionary Multiobjective Optimization , 2017, IEEE Transactions on Cybernetics.

[54]  Tao Chen,et al.  Security testing of web applications: a search-based approach for detecting SQL injection vulnerabilities , 2019, GECCO.

[55]  Tao Chen,et al.  DeepSQLi: deep semantic learning for testing SQL injection , 2020, ISSTA.

[56]  Tim Menzies,et al.  Learning from Open-Source Projects: An Empirical Study on Defect Prediction , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[57]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[58]  Qingfu Zhang,et al.  Evolutionary multiobjective optimization with hybrid selection principles , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[59]  Kay Chen Tan,et al.  Which Surrogate Works for Empirical Performance Modelling? A Case Study with Differential Evolution , 2019, 2019 IEEE Congress on Evolutionary Computation (CEC).

[60]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[61]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[62]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[63]  Tim Menzies,et al.  Balancing Privacy and Utility in Cross-Company Defect Prediction , 2013, IEEE Transactions on Software Engineering.

[64]  Jinhua Zheng,et al.  Achieving balance between proximity and diversity in multi-objective evolutionary algorithm , 2012, Inf. Sci..

[65]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[66]  Burak Turhan,et al.  A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction , 2017, Inf. Softw. Technol..

[67]  Michele Lanza,et al.  An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[68]  Sam Kwong,et al.  Class-specific soft voting based multiple extreme learning machines ensemble , 2015, Neurocomputing.

[69]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[70]  Shane McIntosh,et al.  The Impact of Automated Parameter Optimization on Defect Prediction Models , 2018, IEEE Transactions on Software Engineering.

[71]  Audris Mockus,et al.  Towards building a universal defect prediction model , 2014, MSR 2014.

[72]  Steffen Herbold,et al.  A systematic mapping study on cross-project defect prediction , 2017, ArXiv.

[73]  Xin Yao,et al.  Empirical Investigations of Reference Point Based Methods When Facing a Massively Large Number of Objectives: First Results , 2017, EMO.

[74]  Steffen Herbold,et al.  Training data selection for cross-project defect prediction , 2013, PROMISE.

[75]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[76]  Qingfu Zhang,et al.  Evolutionary Many-Objective Optimization Based on Adversarial Decomposition , 2017, IEEE Transactions on Cybernetics.

[77]  Qingfu Zhang,et al.  Stable Matching-Based Selection in Evolutionary Multiobjective Optimization , 2014, IEEE Transactions on Evolutionary Computation.

[78]  Shane McIntosh,et al.  Automated Parameter Optimization of Classification Techniques for Defect Prediction Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[79]  Tim Menzies,et al.  Tuning for Software Analytics: is it Really Necessary? , 2016, Inf. Softw. Technol..

[80]  Xin Yao,et al.  Two-Archive Evolutionary Algorithm for Constrained Multiobjective Optimization , 2017, IEEE Transactions on Evolutionary Computation.

[81]  Xin Yao,et al.  FEMOSAA , 2016, ACM Trans. Softw. Eng. Methodol..

[82]  Jongmoon Baik,et al.  A Hybrid Instance Selection Using Nearest-Neighbor for Cross-Project Defect Prediction , 2015, Journal of Computer Science and Technology.

[83]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[84]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[85]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[86]  Sam Kwong,et al.  EVOLVING EXTREME LEARNING MACHINE PARADIGM WITH ADAPTIVE OPERATOR SELECTION AND PARAMETER CONTROL , 2013 .

[87]  Satish Kumar,et al.  Multi-Tenant Cloud Service Composition Using Evolutionary Optimization , 2018, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[88]  Kalyanmoy Deb,et al.  A dual-population paradigm for evolutionary multiobjective optimization , 2015, Inf. Sci..