Application of Random Forest in Predicting Fault-Prone Classes

There are available metrics for predicting fault prone classes, which may help software organizations for planning and performing testing activities. This may be possible due to proper allocation of resources on fault prone parts of the design and code of the software. Hence, importance and usefulness of such metrics is understandable, but empirical validation of these metrics is always a great challenge. Random forest (RF) algorithm has been successfully applied for solving regression and classification problems in many applications. This paper evaluates the capability of RF algorithm in predicting fault prone software classes using open source software. The results indicate that the prediction performance of random forest is good. However, similar types of studies are required to be carried out in order to establish the acceptability of the RF model.

[1]  Joydeep Ghosh,et al.  Investigation of the random forest framework for classification of hyperspectral data , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[2]  Taghi M. Khoshgoftaar,et al.  Application of neural networks to software quality modeling of a very large telecommunications system , 1997, IEEE Trans. Neural Networks.

[3]  Lionel C. Briand,et al.  Exploring the relationships between design measures and software quality in object-oriented systems , 2000, J. Syst. Softw..

[4]  Yuming Zhou,et al.  Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults , 2006, IEEE Transactions on Software Engineering.

[5]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[6]  Letha H. Etzkorn,et al.  Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes , 2007, IEEE Transactions on Software Engineering.

[7]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[8]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Joanne Bechta Dugan,et al.  Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods , 2007, IEEE Transactions on Software Engineering.

[11]  Adam A. Porter,et al.  Empirically guided software development using metric-based classification trees , 1990, IEEE Software.

[12]  Martin Hitz,et al.  Measuring coupling and cohesion in object-oriented systems , 1995 .

[13]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[14]  Sallie M. Henry,et al.  Object-oriented metrics that predict maintainability , 1993, J. Syst. Softw..

[15]  David P. Darcy,et al.  Managerial Use of Metrics for Object-Oriented Software: An Exploratory Analysis , 1998, IEEE Trans. Software Eng..

[16]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[17]  Weizhong Yan,et al.  Application of Random Forest to Aircraft Engine Fault Diagnosis , 2006, The Proceedings of the Multiconference on "Computational Engineering in Systems Applications".

[18]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[19]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[20]  Ian Witten,et al.  Data Mining , 2000 .

[21]  N Sambasivarao Software reuse metrics for object oriented systems , 2007 .

[22]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[23]  Taghi M. Khoshgoftaar,et al.  Using regression trees to classify fault-prone software modules , 2002, IEEE Trans. Reliab..

[24]  Rolph E. Anderson,et al.  Multivariate data analysis (4th ed.): with readings , 1995 .

[25]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[26]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[27]  Joydeep Ghosh,et al.  Random forests of binary hierarchical classifiers for analysis of hyperspectral data , 2003, IEEE Workshop on Advances in Techniques for Analysis of Remotely Sensed Data, 2003.

[28]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[29]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[30]  Akif Günes Koru,et al.  An empirical comparison and characterization of high defect and high complexity modules , 2003, J. Syst. Softw..

[31]  Li Jun,et al.  Identifying Skype Traffic by Random Forest , 2007, 2007 International Conference on Wireless Communications, Networking and Mobile Computing.

[32]  Brian Henderson-Sellers,et al.  Object-oriented metrics: measures of complexity , 1995 .

[33]  Laurent Heutte,et al.  Using Random Forests for Handwritten Digit Recognition , 2007 .

[34]  Richard H. Carver,et al.  An Evaluation of the MOOD Set of Object-Oriented Software Metrics , 1998, IEEE Trans. Software Eng..

[35]  Lionel C. Briand,et al.  Replicated Case Studies for Investigating Quality Factors in Object-Oriented Designs , 2001, Empirical Software Engineering.

[36]  Lionel C. Briand,et al.  A Unified Framework for Cohesion Measurement in Object-Oriented Systems , 1997, Proceedings Fourth International Software Metrics Symposium.

[37]  Mark Lorenz Object-Oriented Software Metrics , 1994 .

[38]  Michelle Cartwright,et al.  An Empirical Investigation of an Object-Oriented Software System , 2000, IEEE Trans. Software Eng..

[39]  Arvinder Kaur,et al.  Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated case study , 2009 .

[40]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[41]  Curtis R. Cook,et al.  Use of Factor Analysis to Develop OOP Software Complexity Metrics , 1994 .

[42]  Hongfang Liu,et al.  Building effective defect-prediction models in practice , 2005, IEEE Software.

[43]  Lionel C. Briand,et al.  A Unified Framework for Coupling Measurement in Object-Oriented Systems , 1999, IEEE Trans. Software Eng..

[44]  M. Munot,et al.  Research Methodology , 2019, Storytelling with Data in Healthcare.

[45]  J. Hamers,et al.  [Methods and techniques]. , 1997, Verpleegkunde.

[46]  David P. Tegarden,et al.  A software complexity model of object-oriented systems , 1995, Decis. Support Syst..

[47]  Johannes R. Sveinsson,et al.  Random Forest classification of multisource remote sensing and geographic data , 2004, IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium.

[48]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[49]  K. K. Aggarwal,et al.  Empirical Study of Object-Oriented Metrics , 2006, J. Object Technol..

[50]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[51]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[52]  Brian Henderson-Sellers,et al.  Object-Oriented Metrics , 1993, TOOLS.

[53]  Haruhiko Kaiya,et al.  Adapting a fault prediction model to allow inter languagereuse , 2008, PROMISE '08.

[54]  M. Pal,et al.  Random forests for land cover classification , 2003, IGARSS 2003. 2003 IEEE International Geoscience and Remote Sensing Symposium. Proceedings (IEEE Cat. No.03CH37477).

[55]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[56]  Chris F. Kemerer,et al.  Towards a metrics suite for object oriented design , 2017, OOPSLA '91.

[57]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .