Using text clustering to predict defect resolution time: a conceptual replication and an evaluation of prediction accuracy

Defect management is a central task in software maintenance. When a defect is reported, appropriate resources must be allocated to analyze and resolve the defect. An important issue in resource allocation is the estimation of Defect Resolution Time (DRT). Prior research has considered different approaches for DRT prediction exploiting information retrieval techniques and similarity in textual defect descriptions. In this article, we investigate the potential of text clustering for DRT prediction. We build on a study published by Raja (2013) which demonstrated that clusters of similar defect reports had statistically significant differences in DRT. Raja’s study also suggested that this difference between clusters could be used for DRT prediction. Our aims are twofold: First, to conceptually replicate Raja’s study and to assess the repeatability of its results in different settings; Second, to investigate the potential of textual clustering of issue reports for DRT prediction with focus on accuracy. Using different data sets and a different text mining tool and clustering technique, we first conduct an independent replication of the original study. Then we design a fully automated prediction method based on clustering with a simulated test scenario to check the accuracy of our method. The results of our independent replication are comparable to those of the original study and we confirm the initial findings regarding significant differences in DRT between clusters of defect reports. However, the simulated test scenario used to assess our prediction method yields poor results in terms of DRT prediction accuracy. Although our replication confirms the main finding from the original study, our attempt to use text clustering as the basis for DRT prediction did not achieve practically useful levels of accuracy.

[1]  Tim Menzies,et al.  Special issue on repeatable results in software engineering prediction , 2012, Empirical Software Engineering.

[2]  Andreas Zeller,et al.  Why Programs Fail: A Guide to Systematic Debugging , 2005 .

[3]  Mohamed Kholief,et al.  Improving bug fix-time prediction model by filtering out outliers , 2013, 2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE).

[4]  Barry Boehm,et al.  Top 10 list [software development] , 2001 .

[5]  James Miller,et al.  Replicating software engineering experiments: a poisoned chalice or the Holy Grail , 2005, Inf. Softw. Technol..

[6]  Barbara A. Kitchenham,et al.  The role of replications in empirical software engineering—a word of warning , 2008, Empirical Software Engineering.

[7]  Robert Rosenthal,et al.  Replication in behavioral research. , 1990 .

[8]  April Kontostathis,et al.  Essential Dimensions of Latent Semantic Indexing (LSI) , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[9]  Dalal Alrajeh,et al.  Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering , 2011 .

[10]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[11]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[12]  Lucas D. Panjer Predicting Eclipse Bug Lifetimes , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[13]  Natalia Juristo Juzgado,et al.  Understanding replication of experiments in software engineering: A classification , 2014, Inf. Softw. Technol..

[14]  Adam A. Porter,et al.  Empirical studies of software engineering: a roadmap , 2000, ICSE '00.

[15]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[16]  Markus Hofmann,et al.  RapidMiner: Data Mining Use Cases and Business Analytics Applications , 2013 .

[17]  Gregory Tassey,et al.  Prepared for what , 2007 .

[18]  Jeffrey C. Carver,et al.  The role of replications in Empirical Software Engineering , 2008, Empirical Software Engineering.

[19]  Mika Mäntylä,et al.  Survey Reproduction of Defect Reporting in Industrial Software Development , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[20]  Per Runeson,et al.  Analyzing Networks of Issue Reports , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[21]  Richard A. Harshman,et al.  Indexing by latent semantic indexing , 1990 .

[22]  Iulian Neamtiu,et al.  Bug-fix time prediction models: can we do better? , 2011, MSR '11.

[23]  Janice Singer,et al.  Guide to Advanced Empirical Software Engineering , 2007 .

[24]  Richard C. Dubes,et al.  Cluster Analysis and Related Issues , 1993, Handbook of Pattern Recognition and Computer Vision.

[25]  Markus Borg,et al.  Enabling traceability reuse for impact analyses: A feasibility study in a safety context , 2013, 2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE).

[26]  Montserrat Batet,et al.  Ontology-based semantic clustering , 2011, AI Commun..

[27]  Mark Harman,et al.  Empirical Software Engineering and Verification , 2012, Lecture Notes in Computer Science.

[28]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2012, Springer Berlin Heidelberg.

[29]  Tovi Grossman,et al.  CommunityCommands: command recommendations for software applications , 2009, UIST '09.

[30]  Ting Su,et al.  A deterministic method for initializing K-means clustering , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[31]  Christoph Treude,et al.  A comparative exploration of FreeBSD bug lifetimes , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[32]  Barbara A. Kitchenham,et al.  Experiments with Analogy-X for Software Cost Estimation , 2008, 19th Australian Conference on Software Engineering (aswec 2008).

[33]  Harald C. Gall,et al.  Predicting the fix time of bugs , 2010, RSSE '10.

[34]  Martin J. Shepperd,et al.  Using simulation to evaluate prediction techniques [for software] , 2001, Proceedings Seventh International Software Metrics Symposium.

[35]  Natalia Juristo Juzgado,et al.  Replication of Software Engineering Experiments , 2010, LASER Summer School.

[36]  Mladen A. Vouk,et al.  On predicting the time taken to correct bug reports in open source projects , 2009, 2009 IEEE International Conference on Software Maintenance.

[37]  Terry Caelli Structural, syntactic, and statistical pattern recognition : joint IAPR International Workshops SSPR 2002 and SPR 2002, Windsor, Ontario, Canada, August 6-9, 2002 : proceedings , 2002 .

[38]  Uzma Raja,et al.  All complaints are not created equal: text analysis of open source software defect reports , 2012, Empirical Software Engineering.

[39]  Jesús M. González-Barahona,et al.  On the reproducibility of empirical software engineering studies based on data retrieved from development repositories , 2011, Empirical Software Engineering.

[40]  R. Blair,et al.  A more realistic look at the robustness and Type II error properties of the t test to departures from population normality. , 1992 .

[41]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[42]  Ahmed E. Hassan,et al.  Explaining software defects using topic models , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[43]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[44]  Liang Gong,et al.  Predicting bug-fixing time: An empirical study of commercial software projects , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[45]  A. Brooks,et al.  Replication's Role in Software Engineering , 2008, Guide to Advanced Empirical Software Engineering.

[46]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[47]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[48]  Ahmed E. Hassan,et al.  Think locally, act globally: Improving defect and effort prediction models , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[49]  Robert J. Walker,et al.  Simulation - A Methodology to Evaluate Recommendation Systems in Software Engineering , 2014, Recommendation Systems in Software Engineering.

[50]  Qi Li,et al.  "Is It Really a Defect?" An Empirical Study on Measuring and Improving the Process of Software Defect Reporting , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[51]  Michele Lanza,et al.  An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[52]  Jason H. Moore,et al.  Optimization of gene set annotations via entropy minimization over variable clusters (EMVC) , 2014, Bioinform..

[53]  Ying Zou,et al.  Studying the fix-time for bugs in large open source projects , 2011, Promise '11.

[54]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[55]  Sunghun Kim,et al.  How long did it take to fix bugs? , 2006, MSR '06.

[56]  Natalia Juristo Juzgado,et al.  Replications of software engineering experiments , 2013, Empirical Software Engineering.

[57]  Tim Menzies,et al.  Local vs. global models for effort estimation and defect prediction , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[58]  Christian Bird,et al.  The inductive software engineering manifesto: principles for industrial data mining , 2011, MALETS '11.

[59]  K. Weick What Theory Is Not, Theorizing Is , 1995 .

[60]  Andreas Zeller CHAPTER 6 – Scientific Debugging , 2009 .

[61]  Phillip A. Laplante,et al.  A Literature Review of Research in Software Defect Reporting , 2013, IEEE Transactions on Reliability.

[62]  Per Runeson,et al.  IR in Software Traceability: From a Bird's Eye View , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[63]  Yasutaka Kamei,et al.  Mining challenge 2012: The Android platform , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[64]  Markus Borg,et al.  Embrace your issues: compassing the software engineering landscape using bug reports , 2014, ASE.

[65]  Brian Robinson,et al.  Improving industrial adoption of software engineering research: a comparison of open and closed source software , 2010, ESEM '10.

[66]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[67]  Martin Shepperd,et al.  Using Simulation to Evaluate Prediction Techniques , 2001 .

[68]  Serge Demeyer,et al.  Filtering Bug Reports for Fix-Time Analysis , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[69]  Andreas Zeller,et al.  Why Programs Fail, Second Edition: A Guide to Systematic Debugging , 2009 .

[70]  Dietmar Pfahl,et al.  Simulation Methods , 2019, Introductory Econometrics for Finance.

[71]  A.E. Hassan,et al.  The road ahead for Mining Software Repositories , 2008, 2008 Frontiers of Software Maintenance.

[72]  Barry W. Boehm,et al.  Software Defect Reduction Top 10 List , 2001, Computer.