How to Recognize Actionable Static Code Warnings (Using Linear SVMs)

Static code warning tools often generate warnings that programmers ignore. Such tools can be made more useful via data mining algorithms that select the "actionable" warnings; i.e. the warnings that are usually not ignored. But what is the best way to build those selection algorithms? To answer that question, we learn predictors for 5,675 actionable warnings seen in 31,058 static code warnings from FindBugs. Several data mining methods perform very well on this task. For example, linear Support Vector Machine achieved median recalls of 96%; median false alarms of 2%; and AUC(TNR, TPR) of over 99%. Other learners (tree-based methods and deep learning) achieved very similar results (usually, within 4% or less). On investigation, we found the reason for all these learners performing very well: the data was intrinsically very simple. Specifically, while our data sets have up to 58 raw features, those features can be approximated by less than two underlying dimensions. For such intrinsically simple data, many different kinds of learners can generate useful models with similar performance. Based on the above, we conclude that it is both simple and effective to use data mining algorithms for selecting "actionable" warnings from static code analysis tools. Also, we recommend using linear SVMs to implement that selecting process (since, at least in our sample, that learner ran relatively quickly and achieved the best all-around performance). Further, for any analytics task, it important to match the complexity of the inference to the complexity of the data. For example, we would not recommend deep learning for finding actionable static code warnings since our data is intrinsically very simple.

[1]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[2]  Michalis Faloutsos,et al.  Graph-based analysis and prediction for software evolution , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[3]  Leon Moonen,et al.  Assessing the value of coding standards: An empirical study , 2008, 2008 IEEE International Conference on Software Maintenance.

[4]  R. Rosenthal Parametric measures of effect size. , 1994 .

[5]  Shane McIntosh,et al.  Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[6]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[7]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[8]  Lin Tan,et al.  Finding patterns in static analysis alerts: improving actionable alert ranking , 2014, MSR 2014.

[9]  Song Wang,et al.  Is there a "golden" feature set for static warning identification?: an experimental evaluation , 2018, ESEM.

[10]  Robert W. Bowdidge,et al.  Why don't software developers use static analysis tools to find bugs? , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[11]  Trong Duc Nguyen,et al.  Exploring API Embedding for API Usages and Applications , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[12]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[13]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Shian-Shyong Tseng,et al.  A novel manufacturing defect detection method using association rule mining techniques , 2005, Expert Syst. Appl..

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Gabriele Bavota,et al.  Sentiment Analysis for Software Engineering: How Far Can We Go? , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[18]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[19]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[20]  Di Chen,et al.  How to “DODGE” Complex Software Analytics , 2019, IEEE Transactions on Software Engineering.

[21]  Tim Menzies,et al.  "Better Data" is Better than "Better Data Miners" (Benefits of Tuning SMOTE for Defect Prediction) , 2017, ICSE.

[22]  Taghi M. Khoshgoftaar,et al.  Deep learning applications and challenges in big data analytics , 2015, Journal of Big Data.

[23]  David A. Gustafson,et al.  Shotgun correlations in software measures , 1993, Softw. Eng. J..

[24]  David Hovemeyer,et al.  Using Static Analysis to Find Bugs , 2008, IEEE Software.

[25]  Nicolas Anquetil,et al.  A Framework to Compare Alert Ranking Algorithms , 2012, 2012 19th Working Conference on Reverse Engineering.

[26]  Premkumar T. Devanbu,et al.  To what extent could we detect field defects? An extended empirical study of false negatives in static bug-finding tools , 2014, Automated Software Engineering.

[27]  Xiaodong Gu,et al.  Deep API learning , 2016, SIGSOFT FSE.

[28]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[29]  Sarah Smith Heckman,et al.  A Model Building Process for Identifying Actionable Static Analysis Alerts , 2009, 2009 International Conference on Software Testing Verification and Validation.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Shane McIntosh,et al.  Automated Parameter Optimization of Classification Techniques for Defect Prediction Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[32]  Song Wang,et al.  Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[33]  Oege de Moor,et al.  Tracking Static Analysis Violations over Time to Capture Developer Characteristics , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[34]  Xiaochen Li,et al.  Deep Learning in Software Engineering , 2018, ArXiv.

[35]  Junfeng Yang,et al.  Correlation exploitation in error ranking , 2004, SIGSOFT '04/FSE-12.

[36]  Tim Menzies,et al.  A Deep Learning Model for Estimating Story Points , 2016, IEEE Transactions on Software Engineering.

[37]  Sarah Smith Heckman,et al.  A systematic literature review of actionable alert identification techniques for automated static code analysis , 2011, Inf. Softw. Technol..

[38]  Aurélien Géron,et al.  Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems , 2017 .

[39]  Martin White,et al.  Deep learning code fragments for code clone detection , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[40]  Tao Xie,et al.  Automatic construction of an effective training set for prioritizing static analysis warnings , 2010, ASE.

[41]  A. T. C. Goh,et al.  Back-propagation neural networks for modeling complex systems , 1995, Artif. Intell. Eng..

[42]  Jane Cleland-Huang,et al.  Semantically Enhanced Software Traceability Using Deep Learning Techniques , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[43]  Vivek Nair,et al.  While Tuning is Good, No Tuner is Best , 2018, ArXiv.

[44]  Gang Zhao,et al.  DeepSim: deep learning code functional similarity , 2018, ESEC/SIGSOFT FSE.

[45]  Zhenchang Xing,et al.  Mining Likely Analogical APIs Across Third-Party Libraries via Large-Scale Unsupervised API Semantics Embedding , 2019, IEEE Transactions on Software Engineering.

[46]  Yi-Zhou Lin,et al.  Structural Damage Detection with Automatic Feature‐Extraction through Deep Learning , 2017, Comput. Aided Civ. Infrastructure Eng..

[47]  Tim Menzies,et al.  The Strangest Thing About Software , 2007, Computer.

[48]  Michael D. Ernst,et al.  Prioritizing Warning Categories by Analyzing Software History , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[49]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Commun. ACM.

[50]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[51]  David Lo,et al.  Deep Transfer Bug Localization , 2019, IEEE Transactions on Software Engineering.

[52]  E. Wagenmakers,et al.  Model Comparison and the Principle of Parsimony , 2015 .

[53]  Yuanzhi Li,et al.  Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.

[54]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[55]  Sunghun Kim,et al.  Reducing Features to Improve Bug Prediction , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[56]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[57]  Jianjun Zhao,et al.  EFindBugs: Effective Error Ranking for FindBugs , 2011, 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation.