Simpler Hyperparameter Optimization for Software Analytics: Why, How, When?

How to make software analytics simpler and faster? One method is to match the complexity of analysis to the intrinsic complexity of the data being explored. For example, hyperparameter optimizers find the control settings for data miners that improve for improving the predictions generated via software analytics. Sometimes, very fast hyperparameter optimization can be achieved by just DODGE-ing away from things tried before. But when is it wise to use DODGE and when must we use more complex (and much slower) optimizers? To answer this, we applied hyperparameter optimization to 120 SE data sets that explored bad smell detection, predicting Github ssue close time, bug report analysis, defect prediction, and dozens of other non-SE problems. We find that DODGE works best for data sets with low "intrinsic dimensionality" (D = 3) and very poorly for higher-dimensional data (D over 8). Nearly all the SE data seen here was intrinsically low-dimensional, indicating that DODGE is applicable for many SE analytics tasks.

[1]  Shane McIntosh,et al.  Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[2]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[3]  Lionel C. Briand,et al.  Predicting fault-prone components in a java legacy system , 2006, ISESE '06.

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Christian Bird,et al.  The Art and Science of Analyzing Software Data , 2015, ICSE 2015.

[6]  Di Chen,et al.  Applications of psychological science for actionable analytics , 2018, ESEC/SIGSOFT FSE.

[7]  Taghi M. Khoshgoftaar,et al.  Analyzing software measurement data with clustering techniques , 2004, IEEE Intelligent Systems.

[8]  Magne Jørgensen Realism in assessment of effort estimation uncertainty: it matters how you ask , 2004, IEEE Transactions on Software Engineering.

[9]  Anh Tuan Nguyen,et al.  Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports (N) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[10]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[11]  Premkumar T. Devanbu,et al.  Comparing static bug finders and statistical prediction , 2014, ICSE.

[12]  Foutse Khomh,et al.  BDTEX: A GQM-based Bayesian approach for the detection of antipatterns , 2011, J. Syst. Softw..

[13]  Gabriele Bavota,et al.  Mining energy-greedy API usage patterns in Android apps: an empirical study , 2014, MSR 2014.

[14]  Tim Menzies,et al.  Problems with Precision , 2007 .

[15]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[16]  Gregg Rothermel,et al.  Software testing: a research travelogue (2000–2014) , 2014, FOSE.

[17]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[18]  Mark Harman,et al.  Regression testing minimization, selection and prioritization: a survey , 2012, Softw. Test. Verification Reliab..

[19]  Hongfang Liu,et al.  An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules , 2009, IEEE Transactions on Software Engineering.

[20]  Forrest Shull,et al.  Local versus Global Lessons for Defect Prediction and Effort Estimation , 2013, IEEE Transactions on Software Engineering.

[21]  Leland Wilkinson,et al.  CHIRP: a new classifier based on composite hypercubes on iterated random projections , 2011, KDD.

[22]  Mark Harman,et al.  The Oracle Problem in Software Testing: A Survey , 2015, IEEE Transactions on Software Engineering.

[23]  Tim Menzies,et al.  What is wrong with topic modeling? And how to fix it using search-based software engineering , 2016, Inf. Softw. Technol..

[24]  David Lo,et al.  Perceptions, Expectations, and Challenges in Defect Prediction , 2020, IEEE Transactions on Software Engineering.

[25]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[26]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[27]  Shane McIntosh,et al.  Automated Parameter Optimization of Classification Techniques for Defect Prediction Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[28]  Tim Menzies,et al.  Tuning for Software Analytics: is it Really Necessary? , 2016, Inf. Softw. Technol..

[29]  Jacek Czerwonka,et al.  CRANE: Failure Prediction, Change Analysis and Test Prioritization in Practice -- Experiences from Windows , 2011, 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation.

[30]  Ayse Basar Bener,et al.  AI-Based Software Defect Predictors : Applications and Benefits in a Case Study , 2011 .

[31]  Jochen Kreimer,et al.  Adaptive Detection of Design Flaws , 2005, LDTA@ETAPS.

[32]  Ayse Basar Bener,et al.  Exploiting the Essential Assumptions of Analogy-Based Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[33]  Alexander K. Petrenko,et al.  Electronic Notes in Theoretical Computer Science , 2009 .

[34]  Forrest Shull,et al.  Investigating the impact of design debt on software quality , 2011, MTD '11.

[35]  Foutse Khomh,et al.  A Bayesian Approach for the Detection of Code and Design Smells , 2009, 2009 Ninth International Conference on Quality Software.

[36]  Di Chen,et al.  How to “DODGE” Complex Software Analytics , 2019, IEEE Transactions on Software Engineering.

[37]  Tim Menzies,et al.  "Better Data" is Better than "Better Data Miners" (Benefits of Tuning SMOTE for Defect Prediction) , 2017, ICSE.

[38]  Premkumar T. Devanbu,et al.  Are deep neural networks the best choice for modeling source code? , 2017, ESEC/SIGSOFT FSE.

[39]  Harald C. Gall,et al.  Predicting the fix time of bugs , 2010, RSSE '10.

[40]  Andrew Begel,et al.  Analyze this! 145 questions for data scientists in software engineering , 2013, ICSE.

[41]  Sven Apel,et al.  The Interplay of Sampling and Machine Learning for Software Performance Prediction , 2020, IEEE Software.

[42]  Mauricio A. Saca Refactoring improving the design of existing code , 2017, 2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII).

[43]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[44]  Sven Apel,et al.  Finding Faster Configurations Using FLASH , 2018, IEEE Transactions on Software Engineering.

[45]  Glenford J. Myers,et al.  Art of Software Testing , 1979 .

[46]  Laurie A. Williams,et al.  Approximating Attack Surfaces with Stack Traces , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[47]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[48]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[49]  Jaechang Nam,et al.  REMI: defect prediction for efficient API testing , 2015, ESEC/SIGSOFT FSE.

[50]  Tim Menzies,et al.  Simple software cost analysis: safe or unsafe? , 2005, PROMISE '05.

[51]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[52]  Lucas Layman,et al.  Less is more: Minimizing code reorganization using XTREE , 2017, Inf. Softw. Technol..

[53]  Tim Menzies,et al.  Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors? , 2016, ArXiv.

[54]  Tim Menzies,et al.  Better Predictors for Issue Lifetime , 2017, ArXiv.

[55]  Christoph Treude,et al.  Per-Corpus Configuration of Topic Modelling for GitHub and Stack Overflow Collections , 2018, ArXiv.

[56]  ZhangHongyu,et al.  Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007 .

[57]  Akito Monden,et al.  Assessing the Cost Effectiveness of Fault Prediction in Acceptance Testing , 2013, IEEE Transactions on Software Engineering.

[58]  Andrea De Lucia,et al.  How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[59]  Tim Menzies,et al.  Bellwethers: A Baseline Method for Transfer Learning , 2017, IEEE Transactions on Software Engineering.

[60]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[61]  Steve Counsell,et al.  Code smells as system-level indicators of maintainability: An empirical study , 2013, J. Syst. Softw..

[62]  Tim Menzies,et al.  RIOT: A Stochastic-Based Method for Workflow Scheduling in the Cloud , 2017, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[63]  David A. Gustafson,et al.  Shotgun correlations in software measures , 1993, Softw. Eng. J..

[64]  Harald C. Gall,et al.  Software Development Analytics (Dagstuhl Seminar 14261) , 2014, Dagstuhl Reports.

[65]  Shinji Kusumoto,et al.  Filtering clones for individual user based on machine learning analysis , 2012, 2012 6th International Workshop on Software Clones (IWSC).

[66]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[67]  K. Goseva-Popstojanova,et al.  Common Trends in Software Fault and Failure Data , 2009, IEEE Transactions on Software Engineering.

[68]  Filomena Ferrucci,et al.  A further analysis on the use of Genetic Algorithm to configure Support Vector Machines for inter-release fault prediction , 2012, SAC '12.

[69]  Yves Le Traon,et al.  The importance of accounting for real-world labelling when predicting software vulnerabilities , 2019, ESEC/SIGSOFT FSE.

[70]  Tim Menzies,et al.  Perspectives on Data Science for Software Engineering , 2016, Perspectives on Data Science for Software Engineering.

[71]  Tim Menzies,et al.  Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[72]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[73]  I. Good,et al.  Fractals: Form, Chance and Dimension , 1978 .

[74]  Mary Czerwinski,et al.  Interactions with big data analytics , 2012, INTR.

[75]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[76]  Ricardo Massa Ferreira Lima,et al.  GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation , 2010, Inf. Softw. Technol..

[77]  Mika Mäntylä,et al.  Comparing and experimenting machine learning techniques for code smell detection , 2015, Empirical Software Engineering.

[78]  Elliot Soloway,et al.  Where the bugs are , 1985, CHI '85.

[79]  Taghi M. Khoshgoftaar,et al.  Evolutionary Optimization of Software Quality Modeling with Multiple Repositories , 2010, IEEE Transactions on Software Engineering.

[80]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[81]  Mark Harman,et al.  Searching for better configurations: a rigorous approach to clone evaluation , 2013, ESEC/FSE 2013.

[82]  Yuming Zhou,et al.  Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models , 2016, SIGSOFT FSE.

[83]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[84]  A. Yamashita,et al.  Exploring the impact of inter-smell relations on software maintainability: An empirical study , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[85]  Tim Menzies Improving IV&V Techniques Through the Analysis of Project Anomalies: Text Mining PITS issue reports - preliminary report , 2006 .

[86]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[87]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[88]  Andrea Janes,et al.  Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[89]  Tim Menzies,et al.  Whence to Learn? Transferring Knowledge in Configurable Systems using BEETLE , 2019, ArXiv.

[90]  Tim Menzies,et al.  On the Value of Ensemble Effort Estimation , 2012, IEEE Transactions on Software Engineering.