Prediction of defect severity by mining software project reports

With ever increasing demands from the software organizations, the rate of the defects being introduced in the software cannot be ignored. This has now become a serious cause of concern and must be dealt with seriously. Defects which creep into the software come with varying severity levels ranging from mild to catastrophic. The severity associated with each defect is the most critical aspect of the defect. In this paper, we intend to predict the models which will be used to assign an appropriate severity level (high, medium, low and very low) to the defects present in the defect reports. We have considered the defect reports from the public domain PITS dataset (PITS A, PITS C, PITS D and PITS E) which are being popularly used by NASA’s engineers. Extraction of the relevant data from the defect reports is accomplished by using text mining techniques and thereafter model prediction is carried out by using one statistical method i.e. Multi-nominal Multivariate Logistic Regression (MMLR) and two machine learning methods viz. Multi-layer Perceptron (MLP) and Decision Tree (DT). The performance of the models has been evaluated using receiver operating characteristics analysis and it was observed that the performance of DT model is the best as compared to the performance of MMLR and MLP models.

[1]  Joanne Bechta Dugan,et al.  Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods , 2007, IEEE Transactions on Software Engineering.

[2]  Glenford J. Myers,et al.  Art of Software Testing , 1979 .

[3]  Hausi A. Müller,et al.  Predicting fault-proneness using OO metrics. An industrial case study , 2002, Proceedings of the Sixth European Conference on Software Maintenance and Reengineering.

[4]  Tim Menzies,et al.  Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[5]  Arvinder Kaur,et al.  Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated case study , 2009 .

[6]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[7]  Ming Zhao,et al.  Application of multivariate analysis for software fault prediction , 1998, Software Quality Journal.

[8]  Sotiris Kotsiantis,et al.  Text Classification Using Machine Learning Techniques , 2005 .

[9]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[10]  Iker Gondra,et al.  Applying machine learning to software fault-proneness prediction , 2008, J. Syst. Softw..

[11]  Bart Goethals,et al.  Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[12]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[13]  Yuming Zhou,et al.  Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults , 2006, IEEE Transactions on Software Engineering.

[14]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[15]  Yue Jiang,et al.  Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.

[16]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[17]  Yuming Zhou,et al.  On the ability of complexity metrics to predict fault-prone classes in object-oriented systems , 2010, J. Syst. Softw..

[18]  Javam C. Machado,et al.  The prediction of faulty classes using object-oriented design metrics , 2001, J. Syst. Softw..

[19]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[20]  Letha H. Etzkorn,et al.  Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes , 2007, IEEE Transactions on Software Engineering.

[21]  Adam A. Porter,et al.  Empirically guided software development using metric-based classification trees , 1990, IEEE Software.

[22]  Ruchika Malhotra,et al.  Fault Prediction Using Statistical and Machine Learning Methods for Improving Software Quality , 2012, J. Inf. Process. Syst..

[23]  Arvinder Kaur,et al.  Empirical validation of object-oriented metrics for predicting fault proneness models , 2010, Software Quality Journal.

[24]  Gerardo Canfora,et al.  How Software Repositories can Help in Resolving a New Change Request , 2005 .

[25]  Raed Shatnawi,et al.  The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process , 2008, J. Syst. Softw..

[26]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[27]  Sari Ghaluh Indah Permata An Attribute Selection For Severity Level Determination According To The Support Vector Machine Classification Result , 2012 .