Classifying False Positive Static Checker Alarms in Continuous Integration Using Convolutional Neural Networks

Static code analysis in Continuous Integration (CI) environment can significantly improve the quality of a software system because it enables early detection of defects without any test executions or user interactions. However, being a conservative over-approximation of system behaviours, static analysis also produces a large number of false positive alarms, identification of which takes up valuable developer time. We present an automated classifier based on Convolutional Neural Networks (CNNs). We hypothesise that many false positive alarms can be classified by identifying specific lexical patterns in the parts of the code that raised the alarm: human engineers adopt a similar tactic. We train a CNN based classifier to learn and detect these lexical patterns, using a total of about 10K historical static analysis alarms generated by six static analysis checkers for over 27 million LOC, and their labels assigned by actual developers. The results of our empirical evaluation suggest that our classifier can be highly effective for identifying false positive alarms, with the average precision across all six checkers of 79.72%.

[1]  Grady Booch,et al.  Object-Oriented Design with Applications , 1990 .

[2]  Yann LeCun,et al.  Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Sukyoung Ryu,et al.  Analysis of JavaScript Web Applications Using SAFE 2.0 , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[5]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[6]  Kwangkeun Yi,et al.  Taming False Alarms from a Domain-Unaware C Analyzer by a Bayesian Statistical Post Analysis , 2005, SAS.

[7]  Dawson R. Engler,et al.  A few billion lines of code later , 2010, Commun. ACM.

[8]  Phil McMinn,et al.  Search‐based software test data generation: a survey , 2004, Softw. Test. Verification Reliab..

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[12]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[13]  Ciera Jaspan,et al.  Tricorder: Building a Program Analysis Ecosystem , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[14]  Jianjun Zhao,et al.  EFindBugs: Effective Error Ranking for FindBugs , 2011, 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation.

[15]  Marco Pistoia,et al.  ALETHEIA: Improving the Usability of Static Security Analysis , 2014, CCS.

[16]  David W. Binkley,et al.  Normalizing Source Code Vocabulary , 2010, 2010 17th Working Conference on Reverse Engineering.

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  Alexander Serebrenik,et al.  Survey of Approaches for Handling Static Analysis Alarms , 2016, 2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[19]  Hongseok Yang,et al.  Selective context-sensitivity guided by impact pre-analysis , 2014, PLDI.

[20]  David Hovemeyer,et al.  Finding bugs is easy , 2004, SIGP.

[21]  Yungbum Jung,et al.  Reducing False Alarms from an Industrial-Strength Static Analyzer by SVM , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[22]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[23]  William Snavely,et al.  Prioritizing Alerts from Multiple Static Analysis Tools, Using Classification Models , 2018, 2018 IEEE/ACM 1st International Workshop on Software Qualities and their Dependencies (SQUADE).

[24]  Patrick Copeland Google's Innovation Factory: Testing, Culture, and Infrastructure , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Mark Harman,et al.  The Oracle Problem in Software Testing: A Survey , 2015, IEEE Transactions on Software Engineering.

[27]  Grace A. Lewis,et al.  Modernizing Legacy Systems - Software Technologies, Engineering Processes, and Business Practices , 2003, SEI series in software engineering.

[28]  William Pugh,et al.  The Google FindBugs fixit , 2010, ISSTA '10.

[29]  Sriram K. Rajamani,et al.  Thorough static analysis of device drivers , 2006, EuroSys.