Empirical analysis of search based algorithms to identify change prone classes of open source software

There are numerous reasons leading to change in software such as changing requirements, changing technology, increasing customer demands, fixing of defects etc. Thus, identifying and analyzing the change-prone classes of the software during software evolution is gaining wide importance in the field of software engineering. This would help software developers to judiciously allocate the resources used for testing and maintenance. Software metrics can be used for constructing various classification models which can be used for timely identification of change prone classes. Search based algorithms which form a subset of machine learning algorithms can be utilized for constructing prediction models to identify change prone classes of software. Search based algorithms use a fitness function to find the best optimal solution among all the possible solutions. In this work, we analyze the effectiveness of hybridized search based algorithms for change prediction. In other words, the aim of this work is to find whether search based algorithms are capable for accurate model construction to predict change prone classes. We have also constructed models using machine learning techniques and compared the performance of these models with the models constructed using Search Based Algorithms. The validation is carried out on two open source Apache projects, Rave and Commons Math. The results prove the effectiveness of hybridized search based algorithms in predicting change prone classes of software. Thus, they can be utilized by the software developers to produce an efficient and better developed software. Used hybridized search based algorithms to identify change prone classes.For empirical validation, two open source projects (Apache Rave,Commons Math) used.Assessed performance of search based algorithms using g-mean and accuracy.Machine learning models constructed and performance compared with hybridised models.Results showed that hybridised models outperformed machine learning models.

[1]  Luciano Sánchez,et al.  Boosting fuzzy rules in classification problems under single‐winner inference , 2007, Int. J. Intell. Syst..

[2]  Doo-Hwan Bae,et al.  Measuring behavioral dependency for improving change-proneness prediction in UML-based design models , 2010, J. Syst. Softw..

[3]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[4]  Spiros Mancoridis,et al.  An architecture for distributing the computation of software clustering algorithms , 2001, Proceedings Working IEEE/IFIP Conference on Software Architecture.

[5]  Antonio González Muñoz,et al.  Table Ii Tc Pattern Recognition Result for 120 Eir Satellite Image Cases Selection of Relevant Features in a Fuzzy Genetic Learning Algorithm , 2001 .

[6]  Arvinder Kaur,et al.  Empirical validation of object-oriented metrics for predicting fault proneness models , 2010, Software Quality Journal.

[7]  Taghi M. Khoshgoftaar,et al.  An application of fuzzy clustering to software quality prediction , 2000, Proceedings 3rd IEEE Symposium on Application-Specific Systems and Software Engineering Technology.

[8]  Xin Yao,et al.  Software effort estimation as a multiobjective learning problem , 2013, TSEM.

[9]  Marjan Mernik,et al.  Replication and comparison of computational experiments in applied evolutionary computing: Common pitfalls and guidelines to avoid them , 2014, Appl. Soft Comput..

[10]  Mark Harman,et al.  A multiple hill climbing approach to software module clustering , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[11]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[12]  Inés Couso,et al.  Combining GP operators with SA search to evolve fuzzy rule based classifiers , 2001, Inf. Sci..

[13]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[14]  James M. Bieman,et al.  Cohesion and reuse in an object-oriented system , 1995, SSR '95.

[15]  David E. Rumelhart,et al.  Product Units: A Computationally Powerful and Biologically Plausible Extension to Backpropagation Networks , 1989, Neural Computation.

[16]  Hongfang Liu,et al.  Identifying and characterizing change-prone classes in two large-scale open-source products , 2007, J. Syst. Softw..

[17]  Mark Harman,et al.  Cloud engineering is Search Based Software Engineering too , 2013, J. Syst. Softw..

[18]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[19]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[20]  Ruchika Malhotra,et al.  Investigation of relationship between object-oriented metrics and change proneness , 2013, Int. J. Mach. Learn. Cybern..

[21]  Lionel C. Briand,et al.  A Unified Framework for Coupling Measurement in Object-Oriented Systems , 1999, IEEE Trans. Software Eng..

[22]  Richard C. Holt,et al.  Information theoretic evaluation of change prediction models for large-scale software , 2006, MSR '06.

[23]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[24]  Yuming Zhou,et al.  Examining the Potentially Confounding Effect of Class Size on the Associations between Object-Oriented Metrics and Change-Proneness , 2009, IEEE Transactions on Software Engineering.

[25]  Sallie M. Henry,et al.  Object-oriented metrics that predict maintainability , 1993, J. Syst. Softw..

[26]  Mark Harman,et al.  The relationship between search based software engineering and predictive modeling , 2010, PROMISE '10.

[27]  Pedro Antonio Gutiérrez,et al.  Evolutionary product-unit neural networks classifiers , 2008, Neurocomputing.

[28]  Ruchika Malhotra,et al.  Fault Prediction Using Statistical and Machine Learning Methods for Improving Software Quality , 2012, J. Inf. Process. Syst..

[29]  Luciano Sánchez,et al.  Boosting fuzzy rules in classification problems under single-winner inference: Research Articles , 2007 .

[30]  Mahmoud O. Elish,et al.  A suite of metrics for quantifying historical changes to predict future change‐prone classes in object‐oriented software , 2013, J. Softw. Evol. Process..

[31]  Witold Pedrycz,et al.  An Empirical Exploration of the Distributions of the Chidamber and Kemerer Object-Oriented Metrics Suite , 2004, Empirical Software Engineering.

[32]  Giuliano Antoniol,et al.  Concept Location with Genetic Algorithms: A Comparison of Four Distributed Architectures , 2010, 2nd International Symposium on Search Based Software Engineering.

[33]  Parag C. Pendharkar,et al.  Exhaustive and heuristic search approaches for learning a software defect prediction model , 2010, Eng. Appl. Artif. Intell..

[34]  Mikael Lindvall Are large C++ classes change‐prone? An empirical investigation , 1998 .

[35]  Banu Diri,et al.  An Artificial Immune System Approach for Fault Prediction in Object-Oriented Software , 2007, 2nd International Conference on Dependability of Computer Systems (DepCoS-RELCOMEX '07).

[36]  Raed Shatnawi,et al.  The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process , 2008, J. Syst. Softw..

[37]  Yan Ma,et al.  Adequate and Precise Evaluation of Quality Models in Software Engineering Studies , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[38]  Yuming Zhou,et al.  The ability of object-oriented metrics to predict change-proneness: a meta-analysis , 2011, Empirical Software Engineering.

[39]  Krzysztof Michalak,et al.  Correlation-based Feature Selection Strategy in Neural Classification , 2006, Sixth International Conference on Intelligent Systems Design and Applications.

[40]  Doo-Hwan Bae,et al.  Behavioral Dependency Measurement for Change-Proneness Prediction in UML 2.0 Design Models , 2008, 2008 32nd Annual IEEE International Computer Software and Applications Conference.

[41]  Bart Baesens,et al.  Mining software repositories for comprehensible software fault prediction models , 2008, J. Syst. Softw..

[42]  Lionel C. Briand,et al.  Exploring the relationships between design measures and software quality in object-oriented systems , 2000, J. Syst. Softw..

[43]  María José del Jesús,et al.  Induction of fuzzy-rule-based classifiers with evolutionary boosting algorithms , 2004, IEEE Transactions on Fuzzy Systems.

[44]  Lionel C. Briand,et al.  Replicated Case Studies for Investigating Quality Factors in Object-Oriented Designs , 2001, Empirical Software Engineering.

[45]  Linda Di Geronimo,et al.  A Parallel Genetic Algorithm Based on Hadoop MapReduce for the Automatic Generation of JUnit Test Suites , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[46]  Aurora Trinidad Ramirez Pozo,et al.  A symbolic fault-prediction model based on multiobjective particle swarm optimization , 2010, J. Syst. Softw..

[47]  Ruchika Malhotra,et al.  Defect Collection and Reporting System for Git based Open Source Software , 2014, 2014 International Conference on Data Mining and Intelligent Computing (ICDMIC).

[48]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[49]  Colin J Burgess,et al.  Can genetic programming improve software effort estimation? A comparative evaluation , 2001, Inf. Softw. Technol..

[50]  Carl G. Davis,et al.  A Hierarchical Model for Object-Oriented Design Quality Assessment , 2002, IEEE Trans. Software Eng..

[51]  Gordon Fraser,et al.  On Parameter Tuning in Search Based Software Engineering , 2011, SSBSE.

[52]  Akif Günes Koru,et al.  Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products , 2005, IEEE Transactions on Software Engineering.

[53]  Shih-Wei Lin,et al.  PSOLDA: A particle swarm optimization approach for enhancing classification accuracy rate of linear discriminant analysis , 2009, Appl. Soft Comput..

[54]  Michelle Cartwright,et al.  An Empirical Investigation of an Object-Oriented Software System , 2000, IEEE Trans. Software Eng..

[55]  Arvinder Kaur,et al.  Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated case study , 2009 .

[56]  Luciano Sánchez,et al.  Induction of descriptive fuzzy classifiers with the Logitboost algorithm , 2006, Soft Comput..

[57]  Jana Polgar,et al.  Object-Oriented Software Metrics , 2005, Encyclopedia of Information Science and Technology.

[58]  Kaushal K. Shukla,et al.  Neuro-genetic prediction of software development effort , 2000, Inf. Softw. Technol..

[59]  Bryan F. Jones,et al.  Automatic structural testing using genetic algorithms , 1996, Softw. Eng. J..

[60]  Siti Zaiton Mohd Hashim,et al.  A PSO-based model to increase the accuracy of software development effort estimation , 2012, Software Quality Journal.

[61]  Lionel C. Briand,et al.  A Unified Framework for Cohesion Measurement in Object-Oriented Systems , 2004, Empirical Software Engineering.

[62]  Mark Harman,et al.  Search-based software engineering , 2001, Inf. Softw. Technol..

[63]  Miguel Toro,et al.  Evolutionary learning of hierarchical decision rules , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[64]  Ruchika Malhotra,et al.  The Ability of Search-Based Algorithms to Predict Change-Prone Classes , 2014 .