A systematic model building process for predicting actionable static analysis alerts

Automated static analysis tools can identify potential source code anomalies, like null pointers, buffer overflows, and unclosed streams that could lead to field failures. These anomalies, which we call alerts, require inspection by a developer to determine if the alert is important enough to fix. Actionable alert identification techniques can supplement automated static analysis tools by classifying or prioritizing the alerts generated by automated static analysis such that the likelihood of a developer inspecting actionable alerts first is increased. By classifying and prioritizing actionable static analysis alerts, the developer will focus his or her time on inspecting and fixing actionable alerts rather than inspecting and suppressing unactionable alerts. The goal of my research is to reduce inspection time by accurately predicting actionable and unactionable alerts when using static analysis by creating and validating a systematic actionable alert identification model . The Systematic Actionable Alert Identification (SAAI) process uses machine learning to identify actionable alerts. Investigation of the following three hypotheses will inform the goal of my research: (1) Hypothesis 1: The artifact characteristics of an alert and the surrounding source code are predictive of the actionability of an alert. (2) Hypothesis 2: A systematic actionable alert identification technique using machine learning can accurately identify actionable alerts. (3) Hypothesis 3: A systematic actionable alert identification technique using machine learning is project specific. A benchmark, FAULTBENCH, provides the evaluation framework for the proposed SAAI model building process and comparison with other actionable alert identification techniques. The dissertation presents a feasibility study and three empirical studies evaluating the hypotheses above. The feasibility study evaluates an adaptive actionable alert identification technique that utilizes the alert’s type and code location in addition to developer feedback to prioritize actionable alerts. The first empirical study investigates hypotheses 1–3 using FAULTBENCH on 15 SAAI models generated on five treatments for each of three subject programs. The treatments considered different grouping of alerts within revisions to train and test SAAI. The second empirical study is a comparative evaluation of the generated SAAI models with other actionable alert identification techniques in further evaluation of Hypothesis 2. Additionally, an empirical user study was conducted where students in the senior capstone project course used a custom SAAI model during development of their software project. Selection of predictive artifact characteristics as part of the SAAI process suggests the acceptance of hypothesis 1. All but four of the 58 artifact characteristics used to build SAAI models were in one or more of the artifact characteristics subsets. The SAAI model identified actionable and unactionable alerts with greater than 90% accuracy for eight of the 15 FAULTBENCH subject treatments. Comparing SAAI models with other actionable alert identification techniques from literature found that SAAI models had the highest accuracy for 11 of the 15 treatments when classifying the full alert sets. Both of the above results support hypothesis 2. Due to accuracies greater than 90% when applying artifact characteristic subsets and machine learning algorithms for one subject program to another subject program, hypothesis 3 is not supported on the evaluated subject programs. The contributions of this work are as follows: (1) A systematic actionable alert identification model building process to predict actionable and unactionable automated static analysisalerts; (2) A benchmark, FAULTBENCH, for evaluating and comparing actionable alert identification techniques; and (3) A comparative evaluation of systematic actionable alert identification models with other actionable alert identification techniques from literature.

[1]  Sarah Smith Heckman,et al.  A Model Building Process for Identifying Actionable Static Analysis Alerts , 2009, 2009 International Conference on Software Testing Verification and Validation.

[2]  Sarah Smith Heckman,et al.  A systematic literature review of actionable alert identification techniques for automated static code analysis , 2011, Inf. Softw. Technol..

[3]  Sarah Smith Heckman,et al.  Using groupings of static analysis alerts to identify files likely to contain field failures , 2007, ESEC-FSE '07.

[4]  Laurie A. Williams,et al.  OpenSeminar: Web-based Collaboration Tool for Open Educational Resources , 2005, 2005 International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[5]  Laurie Williams,et al.  A measurement framework of alert characteristics for false positive mitigation models , 2008 .

[6]  Sarah Smith Heckman,et al.  Identifying fault-prone files using static analysis alerts through singular value decomposition , 2007, CASCON.

[7]  Laurie A. Williams,et al.  Resources for Agile Software Development in the Software Engineering Course , 2005, 18th Conference on Software Engineering Education & Training (CSEET'05).

[8]  Sarah Smith Heckman Adaptive Probabilistic Model for Ranking Code-Based Static Analysis Alerts , 2007, 29th International Conference on Software Engineering (ICSE'07 Companion).

[9]  Sarah Smith Heckman Adaptively ranking alerts generated from automated static analysis , 2007, ACM Crossroads.

[10]  Laurie Williams,et al.  Automated Adaptive Ranking and Filtering of Static Analysis Alerts , 2006 .

[11]  Anthony Potoczniak,et al.  Five Points of Connectivity. , 2005 .

[12]  Sarah Smith Heckman,et al.  Teaching second-level Java and software engineering with Android , 2011, 2011 24th IEEE-CS Conference on Software Engineering Education and Training (CSEE&T).

[13]  Michael Rappa,et al.  Open Course Resources as Part of the OpenSeminar in Software Engineering , 2006, 19th Conference on Software Engineering Education & Training (CSEET'06).

[14]  Sarah Smith Heckman,et al.  On establishing a benchmark for evaluating static analysis alert prioritization and classification techniques , 2008, ESEM '08.