Evaluation of instance-based feature subset selection algorithm for maintainability prediction

An essential attribute of the software quality is maintainability which incurs almost 60–70% of total project cost. Since software maintainability prediction is a complicated process; estimating maintainability in the prior phases of software development lifecycle (SDLC) is advantageous. Further, it helps in building economical software and improving resource planning well in advance. Software metrics are strongly correlated with software maintainability as they help in examining the structural quality and characteristics of a software. Feature subset selection (FSS) is an important data preprocessing technique used in data mining. It involves determining a subset of notable features for building a prediction model. All software metrics are not equally relevant; hence, using all of them for predicting maintainability will significantly increase time, budget and effort. Thus, to achieve best maintainability prediction results with a particular learning algorithm, it is critical to select the most relevant features that manifest the characteristics of the software in hand which is this study are two open source software — Apache Jackrabbit and Light Weight Java Game Library (LWJGL). Our main focus has been to reduce the number of metrics using an instance-based FSS technique Relief and then use these relevant metrics to predict maintainability. It was observed that Linear Regression algorithm showed maximum increase of approximately 11% in accuracy levels with Relief FSS algorithm for both the open source software datasets. This paper will enable software developers to improve design and coding and identify the most relevant software metrics that affect software maintainability.

[1]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Chris Hans Elastic Net Regression Modeling With the Orthant Normal Prior , 2011 .

[3]  Wei Li,et al.  Another metric suite for object-oriented programming , 1998, J. Syst. Softw..

[4]  Samina Khalid,et al.  A survey of feature selection and feature extraction techniques in machine learning , 2014, 2014 Science and Information Conference.

[5]  Puneet Kumar Goyal,et al.  QMOOD metric sets to assess quality of Java program , 2014, 2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT).

[6]  Ruchika Malhotra,et al.  Software Maintainability Prediction using Machine Learning Algorithms , 2012 .

[7]  Anand V. Hudli,et al.  Software metrics for object-oriented designs , 1994, Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[8]  Barry W. Boehm,et al.  Finding the right data for software cost modeling , 2005, IEEE Software.

[9]  Mehwish Riaz,et al.  A systematic review of software maintainability prediction and metrics , 2009, ESEM 2009.

[10]  Arvinder Kaur,et al.  Statistical Comparison of Modelling Methods for Software Maintainability Prediction , 2013, Int. J. Softw. Eng. Knowl. Eng..

[11]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[12]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[13]  Jared P. Lander R for Everyone: Advanced Analytics and Graphics , 2013 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Moataz A. Ahmed,et al.  Machine learning approaches for predicting software maintainability: a fuzzy-based transparent model , 2013, IET Softw..

[16]  Chris F. Kemerer,et al.  Towards a metrics suite for object oriented design , 2017, OOPSLA '91.

[17]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[18]  Ruchika Malhotra,et al.  BENCHMARKING FRAMEWORK FOR MAINTAINABILITY PREDICTION OF OPEN SOURCE SOFTWARE USING OBJECT ORIENTED METRICS , 2016 .

[19]  Baijian Yang,et al.  A Scalable Feature Selection and Model Updating Approach for Big Data Machine Learning , 2016, 2016 IEEE International Conference on Smart Cloud (SmartCloud).

[20]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[21]  K. K. Aggarwal,et al.  Empirical Study of Object-Oriented Metrics , 2006, J. Object Technol..

[22]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[23]  Sallie M. Henry,et al.  Object-oriented metrics that predict maintainability , 1993, J. Syst. Softw..

[24]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..