Topic Modeling Based Warning Prioritization from Change Sets of Software Repository

Many existing warning prioritization techniques seek to reorder the static analysis warnings such that true positives are provided first. However, excessive amount of time is required therein to investigate and fix prioritized warnings because some are not actually true positives or are irrelevant to the code context and topic. In this paper, we propose a warning prioritization technique that reflects various latent topics from bug-related code blocks. Our main aim is to build a prioritization model that comprises separate warning priorities depending on the topic of the change sets to identify the number of true positive warnings. For the performance evaluation of the proposed model, we employ a performance metric called warning detection rate, widely used in many warning prioritization studies, and compare the proposed model with other competitive techniques. Additionally, the effectiveness of our model is verified via the application of our technique to eight industrial projects of a real global company.

[1]  Hareton K. N. Leung,et al.  MSR4SM: Using topic models to effectively mining software repositories for software maintenance tasks , 2015, Inf. Softw. Technol..

[2]  Tao Xie,et al.  DSD-Crasher: A hybrid analysis tool for bug finding , 2008 .

[3]  Audris Mockus,et al.  Identifying reasons for software changes using historic databases , 2000, Proceedings 2000 International Conference on Software Maintenance.

[4]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[5]  Rainer Koschke,et al.  Effort-Aware Defect Prediction Models , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[6]  Ayse Basar Bener,et al.  Defect prediction from static code features: current results, limitations, new approaches , 2010, Automated Software Engineering.

[7]  Sarah Smith Heckman,et al.  On establishing a benchmark for evaluating static analysis alert prioritization and classification techniques , 2008, ESEM '08.

[8]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[9]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[10]  Kwangkeun Yi,et al.  Taming False Alarms from a Domain-Unaware C Analyzer by a Bayesian Statistical Post Analysis , 2005, SAS.

[11]  Michael D. Ernst,et al.  Prioritizing Warning Categories by Analyzing Software History , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[12]  Song Wang,et al.  Is there a "golden" feature set for static warning identification?: an experimental evaluation , 2018, ESEM.

[13]  Dawson R. Engler,et al.  Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations , 2003, SAS.

[14]  Stephen W. Thomas Mining software repositories using topic models , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[15]  Junfeng Yang,et al.  Correlation exploitation in error ranking , 2004, SIGSOFT '04/FSE-12.

[16]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[17]  Letha H. Etzkorn,et al.  Configuring latent Dirichlet allocation based feature location , 2014, Empirical Software Engineering.

[18]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[19]  Lin Tan,et al.  Finding patterns in static analysis alerts: improving actionable alert ranking , 2014, MSR 2014.

[20]  Sushil Krishna Bajracharya,et al.  Mining concepts from code with probabilistic topic models , 2007, ASE.

[21]  Rozaida Ghazali,et al.  A survey on bug prioritization , 2017, Artificial Intelligence Review.

[22]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[23]  Hongfang Liu,et al.  Theory of relative defect proneness , 2008, Empirical Software Engineering.

[24]  Ian H. Witten,et al.  Chapter 10 – Deep learning , 2017 .

[25]  Günter Neumann,et al.  Assisting bug Triage in Large Open Source Projects Using Approximate String Matching , 2012, ICSEA 2012.

[26]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[27]  Bogdan Dit,et al.  TopicXP: Exploring topics in source code using Latent Dirichlet Allocation , 2010, 2010 IEEE International Conference on Software Maintenance.

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Santonu Sarkar,et al.  Mining business topics in source code using latent dirichlet allocation , 2008, ISEC '08.

[30]  Thomas Zimmermann,et al.  Improving bug triage with bug tossing graphs , 2009, ESEC/FSE '09.

[31]  Premkumar T. Devanbu,et al.  BugCache for inspections: hit or miss? , 2011, ESEC/FSE '11.

[32]  Nicholas A. Kraft,et al.  Changeset-Based Topic Modeling of Software Repositories , 2020, IEEE Transactions on Software Engineering.

[33]  Osamu Mizuno,et al.  Bug prediction based on fine-grained module histories , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[34]  Thomas L. Griffiths,et al.  Online Inference of Topics with Latent Dirichlet Allocation , 2009, AISTATS.

[35]  Hosik Choi,et al.  An empirical study on classification methods for alarms from a bug-finding static C analyzer , 2007, Inf. Process. Lett..

[36]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[37]  Letha H. Etzkorn,et al.  Bug localization using latent Dirichlet allocation , 2010, Inf. Softw. Technol..

[38]  YangJunfeng,et al.  Correlation exploitation in error ranking , 2004 .

[39]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2012, Springer Berlin Heidelberg.

[40]  Taketoshi Yoshida,et al.  En-LDA: An Novel Approach to Automatic Bug Report Assignment with Entropy Optimized Latent Dirichlet Allocation , 2017, Entropy.

[41]  Sarah Smith Heckman,et al.  A systematic literature review of actionable alert identification techniques for automated static code analysis , 2011, Inf. Softw. Technol..

[42]  Nicholas A. Kraft,et al.  Modeling changeset topics for feature location , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[43]  Hung Viet Nguyen,et al.  A topic-based approach for narrowing the search space of buggy files from a bug report , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[44]  J. David Morgenthaler,et al.  Predicting accurate and actionable static analysis warnings , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[45]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Michael D. Ernst,et al.  Which warnings should I fix first? , 2007, ESEC-FSE '07.

[47]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..