Reducing False Alarms from an Industrial-Strength Static Analyzer by SVM

Static analysis tools are useful to find potential bugs and security vulnerabilities in a source code, however, false alarms from such tools lower their usability. In order to reduce various kinds of false alarms and enhance the performance of the tools, we propose a machine learning based false alarm reduction method. Abstract syntax trees (AST) are used to represent structural characteristics and support vector machine (SVM) is used to learn models and classify new alarms using probability. This probability is used to remove false alarms. To evaluate the proposed method, we performed experiments using a static analysis tool, SPARROW, and Java open source projects. As a result, 37.33% of false alarms were reduced, with only removing 3.16% of true alarms.

[1]  L. Moonen,et al.  Prioritizing Software Inspection Results using Static Profiling , 2006, 2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation.

[2]  Ralf Huuck,et al.  Counterexample Guided Path Reduction for Static Program Analysis , 2010, Concurrency, Compositionality, and Correctness.

[3]  Gilles Roussel,et al.  Syntax tree fingerprinting for source code similarity detection , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[4]  David Hovemeyer,et al.  Finding bugs is easy , 2004, SIGP.

[5]  Punam Bedi,et al.  Predicting the priority of a reported bug using machine learning techniques and cross project validation , 2012, 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA).

[6]  Sarah Smith Heckman,et al.  A Model Building Process for Identifying Actionable Static Analysis Alerts , 2009, 2009 International Conference on Software Testing Verification and Validation.

[7]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[8]  Dawson R. Engler,et al.  Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations , 2003, SAS.

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Sarah Smith Heckman,et al.  A systematic literature review of actionable alert identification techniques for automated static code analysis , 2011, Inf. Softw. Technol..

[11]  Kwangkeun Yi,et al.  Taming False Alarms from a Domain-Unaware C Analyzer by a Bayesian Statistical Post Analysis , 2005, SAS.

[12]  Gilles Roussel,et al.  Syntax tree fingerprinting: a foundation for source code similarity detection , 2009 .

[13]  Jun Zhou,et al.  A Hybrid Approach to Detecting Security Defects in Programs , 2009, 2009 Ninth International Conference on Quality Software.

[14]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[17]  Deqing Wang,et al.  Predicting Bugs' Components via Mining Bug Reports , 2012, J. Softw..

[18]  Hosik Choi,et al.  An empirical study on classification methods for alarms from a bug-finding static C analyzer , 2007, Inf. Process. Lett..

[19]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).