Prediction & Assessment of Change Prone Classes Using Statistical & Machine Learning Techniques

Software today has become an inseparable part of our life. In order to achieve the ever demanding needs of customers, it has to rapidly evolve and include a number of changes. In this paper, our aim is to study the relationship of object oriented metrics with change proneness attribute of a class. Prediction models based on this study can help us in identifying change prone classes of a software. We can then focus our efforts on these change prone classes during testing to yield a better quality software. Previously, researchers have used statistical methods for predicting change prone classes. But machine learning methods are rarely used for identification of change prone classes. In our study, we evaluate and compare the performances of ten machine learning methods with the statistical method. This evaluation is based on two open source software systems developed in Java language. We also validated the developed prediction models using other software data set in the same domain (3D modelling). The performance of the predicted models was evaluated using receiver operating characteristic analysis. The results indicate that the machine learning methods are at par with the statistical method for prediction of change prone classes. Another analysis showed that the models constructed for a software can also be used to predict change prone nature of classes of another software in the same domain. This study would help developers in performing effective regression testing at low cost and effort. It will also help the developers to design an effective model that results in less change prone classes, hence better maintenance.

[1]  Ian H. Witten,et al.  Data Mining: Practical Machine Learning Tools and Techniques, 3/E , 2014 .

[2]  François Lustman,et al.  A change impact model for changeability assessment in object-oriented software systems , 2002, Sci. Comput. Program..

[3]  I. Sommerville,et al.  Software engineering (2nd ed.) , 1985 .

[4]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[5]  Michael English,et al.  Fault detection and prediction in an open-source software project , 2009, PROMISE '09.

[6]  Ladan Tahvildari,et al.  A Probabilistic Approach to Predict Changes in Object-Oriented Software Systems , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[7]  Lionel C. Briand,et al.  Exploring the relationships between design measures and software quality in object-oriented systems , 2000, J. Syst. Softw..

[8]  Alexander Chatzigeorgiou,et al.  Predicting the probability of change in object-oriented systems , 2005, IEEE Transactions on Software Engineering.

[9]  Yuming Zhou,et al.  The ability of object-oriented metrics to predict change-proneness: a meta-analysis , 2011, Empirical Software Engineering.

[10]  Brian Henderson-Sellers,et al.  Object-Oriented Metrics , 1995, TOOLS.

[11]  Akif Günes Koru,et al.  Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products , 2005, IEEE Transactions on Software Engineering.

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[14]  Mei-Hwa Chen,et al.  An empirical study on object-oriented metrics , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[15]  Mark Lorenz,et al.  Object-oriented software metrics - a practical guide , 1994 .

[16]  Michelle Cartwright,et al.  An Empirical Investigation of an Object-Oriented Software System , 2000, IEEE Trans. Software Eng..

[17]  Arvinder Kaur,et al.  Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated case study , 2009 .

[18]  Khaled El Emam,et al.  A Validation of Object-oriented Metrics , 1999 .

[19]  Lionel C. Briand,et al.  A Unified Framework for Cohesion Measurement in Object-Oriented Systems , 2004, Empirical Software Engineering.

[20]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[21]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[22]  Yuming Zhou,et al.  Examining the Potentially Confounding Effect of Class Size on the Associations between Object-Oriented Metrics and Change-Proneness , 2009, IEEE Transactions on Software Engineering.

[23]  David P. Darcy,et al.  Managerial Use of Metrics for Object-Oriented Software: An Exploratory Analysis , 1998, IEEE Trans. Software Eng..

[24]  Krzysztof Michalak,et al.  Correlation-based Feature Selection Strategy in Neural Classification , 2006, Sixth International Conference on Intelligent Systems Design and Applications.

[25]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[26]  Ruchika Malhotra,et al.  Software Effort Prediction using Statistical and Machine Learning Methods , 2011 .

[27]  郑肇葆,et al.  基于Naive Bayes Classifiers的航空影像纹理分类 , 2006 .

[28]  Ankita Jain Bansal,et al.  Prediction of Change-Prone Classes Using Machine Learning and Statistical Techniques , 2014 .

[29]  SinghYogesh,et al.  Empirical validation of object-oriented metrics for predicting fault proneness models , 2010 .

[30]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[31]  Lionel C. Briand,et al.  A Unified Framework for Coupling Measurement in Object-Oriented Systems , 1999, IEEE Trans. Software Eng..

[32]  Doo-Hwan Bae,et al.  Behavioral Dependency Measurement for Change-Proneness Prediction in UML 2.0 Design Models , 2008, 2008 32nd Annual IEEE International Computer Software and Applications Conference.

[33]  Lionel C. Briand,et al.  Replicated Case Studies for Investigating Quality Factors in Object-Oriented Designs , 2001, Empirical Software Engineering.

[34]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .