Noise in Bug Report Data and the Impact on Defect Prediction Results

The potential benefits of defect prediction have created widespread interest in research and generated a considerable number of empirical studies. Applications with real-world data revealed a central problem: Real-world data is "dirty" and often of poor quality. Noise in bug report data is a particular problem for defect prediction since it effects the correct classification of software modules. Is the module actually defective or not? In this paper we examine different causes of noise encountered when predicting defects in an industrial software system and we provide an overview of commonly reported causes in related work. Furthermore we conduct an experiment to explore the impact of class noise on the predictions performance. The experiment shows that the prediction results for the studied system remain reliable even at a noise level of 20% probability of incorrect links between bug reports and modules.

[1]  Hongfang Liu,et al.  Building effective defect-prediction models in practice , 2005, IEEE Software.

[2]  Reidar Conradi,et al.  Revisiting the problem of using problem reports for quality assessment , 2006, WoSQ '06.

[3]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[4]  Premkumar T. Devanbu,et al.  Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.

[5]  Foutse Khomh,et al.  Is it a bug or an enhancement?: a text-based approach to classify change requests , 2008, CASCON '08.

[6]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[7]  Premkumar T. Devanbu,et al.  The missing links: bugs and bug-fix commits , 2010, FSE '10.

[8]  Elaine J. Weyuker,et al.  Where the bugs are , 2004, ISSTA '04.

[9]  Rudolf Ramler The impact of product development on the lifecycle of defects , 2008, DEFECTS '08.

[10]  Martin Shepperd,et al.  Data Sets and Data Quality in Software Engineering: Eight Years On , 2016, PROMISE.

[11]  Thomas Zimmermann,et al.  Quality of bug reports in Eclipse , 2007, eclipse '07.

[12]  Felix Kossak,et al.  Extracting knowledge and computable models from data - needs, expectations, and experience , 2004, 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542).

[13]  Andreas Zeller,et al.  Mining version histories to guide software changes , 2005, Proceedings. 26th International Conference on Software Engineering.

[14]  Felix Kossak,et al.  Key Questions in Building Defect Prediction Models in Practice , 2009, PROFES.

[15]  Salvatore J. Stolfo,et al.  Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.

[16]  Rüdiger Lincke,et al.  Comparing software metrics tools , 2008, ISSTA '08.

[17]  Gina Venolia,et al.  The secret life of bugs: Going past the errors and omissions in software repositories , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[18]  Jonathan I. Maletic,et al.  Journal of Software Maintenance and Evolution: Research and Practice Survey a Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution , 2022 .

[19]  Taghi M. Khoshgoftaar,et al.  The necessity of assuring quality in software measurement data , 2004, 10th International Symposium on Software Metrics, 2004. Proceedings..

[20]  Elaine J. Weyuker,et al.  Looking for bugs in all the right places , 2006, ISSTA '06.

[21]  Taghi M. Khoshgoftaar,et al.  The necessity of assuring quality in software measurement data , 2004 .

[22]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[23]  Rudolf Ramler,et al.  Issues and effort in integrating data from heterogeneous software repositories and corporate databases , 2008, ESEM '08.

[24]  Mary Shaw,et al.  Experiences and results from initiating field defect prediction and product test prioritization efforts at ABB Inc. , 2006, ICSE.

[25]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[26]  Rudolf Ramler,et al.  Building Defect Prediction Models in Practice , 2014 .