All complaints are not created equal: text analysis of open source software defect reports

As the use of Open Source Software (OSS) systems increases in the corporate environment, it is important to examine the maintenance process of these projects. OSS projects allow end users to directly submit reports in case of any operational issues. Timely resolution of these defect reports requires effective management of maintenance resources. This study analyzes the usefulness of the textual content of the defect reports as an early indicator of their resolution time. Text Mining techniques are used to categorize defect reports of five OSS projects. Significant variation in the defect resolution time amongst the resulting categories, for each of the sample projects, indicates that a text based classification of defect reports can be useful in early assessment of resolution time before source code level analysis. Such technique can assist in allocation of sufficient maintenance resources to targeted defects and also enable project teams to manage customer expectations regarding defect resolution times.

[1]  W. G. Cochran,et al.  Some consequences when the assumptions for the analysis of variance are not satisfied. , 1947, Biometrics.

[2]  Massimiliano Di Penta,et al.  An approach to classify software maintenance requests , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[3]  Thomas Zimmermann,et al.  Duplicate bug reports considered harmful … really? , 2008, 2008 IEEE International Conference on Software Maintenance.

[4]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[5]  Michael W. Berry,et al.  Understanding search engines: mathematical modeling and text retrieval (software , 1999 .

[6]  Audris Mockus,et al.  A case study of open source software development: the Apache server , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[7]  Xiaolin Teng,et al.  Considering fault removal efficiency in software reliability assessment , 2003, IEEE Trans. Syst. Man Cybern. Part A.

[8]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[9]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[10]  Oscar Nierstrasz,et al.  Assigning bug reports using a vocabulary-based expertise model of developers , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[11]  P. K. Ito 7 Robustness of ANOVA and MANOVA test procedures , 1980 .

[12]  C.-T. Lin,et al.  Software Reliability Analysis by Considering Fault Dependency and Debugging Time Lag , 2006, IEEE Transactions on Reliability.

[13]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[14]  Chadd C. Williams,et al.  Automatic mining of source code repositories to improve bug finding techniques , 2005, IEEE Transactions on Software Engineering.

[15]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[16]  Chris F. Kemerer,et al.  An Empirical Approach to Studying Software Evolution , 1999, IEEE Trans. Software Eng..

[17]  Audris Mockus,et al.  Identifying reasons for software changes using historic databases , 2000, Proceedings 2000 International Conference on Software Maintenance.

[18]  Walt Scacchi,et al.  Data Mining for Software Process Discovery in Open Source Software Development Communities , 2004, MSR.

[19]  S. Yamada,et al.  Optimum software-release time considering an error-detection phenomenon during operation , 1990 .

[20]  Uzma Raja,et al.  Classification of software patches: a text mining approach , 2011, J. Softw. Maintenance Res. Pract..

[21]  Ytzhak H. Levendel Reliability Analysis of Large Software Systems: Defect Data Modeling , 1990, IEEE Trans. Software Eng..

[22]  D. Goodin The cambridge dictionary of statistics , 1999 .

[23]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[24]  Monica Chiarini Tremblay,et al.  Utilizing Text Mining Techniques to Identify Fall Related Injuries , 2005, AMCIS.

[25]  Meir M. Lehman,et al.  Evolution, feedback and software technology , 1994, Proceedings. Ninth International Software Process Workshop.

[26]  Sunita Chulani,et al.  Metrics for managing customer view of software quality , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[27]  Kevin Crowston,et al.  Defining Open Source Software Project Success , 2003, ICIS.

[28]  James M. Bieman,et al.  The FreeBSD project: a replication case study of open source development , 2005, IEEE Transactions on Software Engineering.

[29]  Gail C. Murphy,et al.  Coping with an open bug repository , 2005, eclipse '05.

[30]  Jeffrey L. Goldberg,et al.  CDM: an approach to learning in text categorization , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[31]  Youngjoong Ko,et al.  Automatic Text Categorization by Unsupervised Learning , 2000, COLING.

[32]  Chin-Yu Huang,et al.  Software Release Time Management: How to Use Reliability Growth Models to Make Better Decisions , 2006, 2006 IEEE International Conference on Management of Innovation and Technology.

[33]  Min Xie,et al.  A Study of the Effect of Imperfect Debugging on Software Development Cost , 2003, IEEE Trans. Software Eng..

[34]  Gail C. Murphy,et al.  Determining Implementation Expertise from Bug Reports , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[35]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[36]  C. Jones,et al.  Software defect-removal efficiency , 1996 .

[37]  Gerardo Canfora,et al.  How Software Repositories can Help in Resolving a New Change Request , 2005 .

[38]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[39]  Sharif H. Melouk,et al.  Managing Resource Allocation and Task Prioritization Decisions in Large Scale Virtual Collaborative Development Projects , 2010, Inf. Resour. Manag. J..

[40]  Chris F. Kemerer,et al.  Determinants of software maintenance profiles: an empirical investigation , 1997, J. Softw. Maintenance Res. Pract..

[41]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[42]  R. Brettschneider,et al.  Is your software ready for release? , 1989, IEEE Software.

[43]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[44]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[45]  Q. P. Hu,et al.  A study of the modeling and analysis of software fault‐detection and fault‐correction processes , 2007, Qual. Reliab. Eng. Int..

[46]  Stephen M. Scariano,et al.  The effects of violations of independence assumptions in the one-way ANOVA , 1987 .

[47]  Uzma Raja,et al.  Antecedents of open source software defects: A data mining approach to model formulation, validation and testing , 2009, Inf. Technol. Manag..

[48]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[49]  John D. Musa,et al.  Quantifying Software Validation: When to Stop Testing? , 1989, IEEE Softw..

[50]  Howard B. Lee,et al.  Foundations of Behavioral Research , 1973 .

[51]  Meir M. Lehman FEAST-Feedback, Evolution and Software Technology , 1998, EWSPT.

[52]  Behrouz Homayoun Far,et al.  Explorative study to provide decision support for software release decisions , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[53]  Bogdan Dit,et al.  Measuring the Semantic Similarity of Comments in Bug Reports , 2008 .

[54]  Walter F. Tichy,et al.  Proceedings 25th International Conference on Software Engineering , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[55]  J. Neter,et al.  Applied Linear Regression Models , 1983 .