Is it a bug or an enhancement?: a text-based approach to classify change requests

Bug tracking systems are valuable assets for managing maintenance activities. They are widely used in open-source projects as well as in the software industry. They collect many different kinds of issues: requests for defect fixing, enhancements, refactoring/restructuring activities and organizational issues. These different kinds of issues are simply labeled as "bug" for lack of a better classification support or of knowledge about the possible kinds. This paper investigates whether the text of the issues posted in bug tracking systems is enough to classify them into corrective maintenance and other kinds of activities. We show that alternating decision trees, naive Bayes classifiers, and logistic regression can be used to accurately distinguish bugs from other kinds of issues. Results from empirical studies performed on issues for Mozilla, Eclipse, and JBoss indicate that issues can be classified with between 77% and 82% of correct decisions.

[1]  Stephan Diehl,et al.  Are refactorings less error-prone than other changes? , 2006, MSR '06.

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Harald C. Gall,et al.  Journal of Software Maintenance and Evolution: Research and Practice Visualizing Feature Evolution of Large-scale Software Based on Problem and Modification Report Data , 2022 .

[4]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[5]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[6]  Sandro Morasca,et al.  Measuring and assessing maintainability at the end of high level design , 1993, 1993 Conference on Software Maintenance.

[7]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[8]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[9]  Stephen E. Robertson,et al.  Probabilistic models of indexing and searching , 1980, SIGIR '80.

[10]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[11]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[12]  Marek Vokác Defect frequency and design patterns: an empirical study of industrial code , 2004, IEEE Transactions on Software Engineering.

[13]  R. Yin Case Study Research: Design and Methods , 1984 .

[14]  Daniel M. Germán,et al.  An empirical study of fine-grained software modifications , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[15]  Harald C. Gall,et al.  Detection of logical coupling based on product release history , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[16]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[17]  Laverne W. Stanton,et al.  Applied Regression Analysis: A Research Tool , 1990 .

[18]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[19]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[20]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[21]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[22]  J. O. Rawlings,et al.  Applied Regression Analysis: A Research Tool , 1988 .

[23]  Norio Kurishima,et al.  Quantitative analysis of errors in telecommunications software , 1993, 1993 Conference on Software Maintenance.

[24]  Andreas Zeller,et al.  Mining version histories to guide software changes , 2005, Proceedings. 26th International Conference on Software Engineering.

[25]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[26]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[27]  Giuliano Antoniol,et al.  Threats on building models from CVS and Bugzilla repositories: the Mozilla case study , 2007, CASCON.

[28]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[29]  Ian Witten,et al.  Data Mining , 2000 .

[30]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[31]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..