A study of mutual information based feature selection for case based reasoning in software cost estimation

Software cost estimation is one of the most crucial activities in software development process. In the past decades, many methods have been proposed for cost estimation. Case based reasoning (CBR) is one of these techniques. Feature selection is an important preprocessing stage of case based reasoning. Most existing feature selection methods of case based reasoning are 'wrappers' which can usually yield high fitting accuracy at the cost of high computational complexity and low explanation of the selected features. In our study, the mutual information based feature selection (MICBR) is proposed. This approach hybrids both 'wrapper' and 'filter' mechanism which is another kind of feature selector with much lower complexity than wrappers, and the features selected by filters are likely to be generalized to other conditions. The MICBR is then compared with popular feature selectors and the published works. The results show that the MICBR is an effective feature selector for case based reasoning by overcoming some of the limitations and computational complexities of other feature selection techniques in the field.

[1]  Keith Phalp,et al.  An investigation of machine learning based prediction systems , 2000, J. Syst. Softw..

[2]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[3]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  C. van Koten,et al.  Bayesian statistical effort prediction models for data-centred 4GL software development , 2006, Inf. Softw. Technol..

[5]  Magne Jørgensen,et al.  A review of studies on expert estimation of software development effort , 2004, J. Syst. Softw..

[6]  Ioannis Stamelos,et al.  Software productivity and effort prediction with ordinal regression , 2005, Inf. Softw. Technol..

[7]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[8]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[9]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[10]  Olga A. Nikolaychuk,et al.  Computer-aided identification of mechanical system's technical state with the aid of case-based reasoning , 2008, Expert Syst. Appl..

[11]  Stefan Biffl,et al.  Optimal project feature weights in analogy-based cost estimation: improvement and limitations , 2006, IEEE Transactions on Software Engineering.

[12]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[13]  Michael M. Richter,et al.  A flexible method for software effort estimation by analogy , 2007, Empirical Software Engineering.

[14]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[15]  Jae Kyu Lee,et al.  Quasi-optimal case-selective neural network model for software effort estimation , 2001, Expert Syst. Appl..

[16]  D. Ross Jeffery,et al.  An Empirical Study of Analogy-based Software Effort Estimation , 1999, Empirical Software Engineering.

[17]  Katrina D. Maxwell,et al.  Applied Statistics for Software Managers , 2002 .

[18]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[19]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  S Ferguson,et al.  Regression toward the mean? , 1987, Archives of neurology.

[21]  Pei-Chann Chang,et al.  A fuzzy case-based reasoning model for sales forecasting in print circuit board industries , 2008, Expert Syst. Appl..

[22]  R. Moddemeijer On estimation of entropy and mutual information of continuous distributions , 1989 .

[23]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[24]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[25]  Abbas Heiat,et al.  Comparison of artificial neural network and regression models for estimating software development effort , 2002, Inf. Softw. Technol..

[26]  Parag C. Pendharkar,et al.  A probabilistic model for predicting software development effort , 2003, IEEE Transactions on Software Engineering.

[27]  Martin Shepperd,et al.  Experiences Using Case-Based Reasoning to Predict Software Project Effort , 2000 .

[28]  Colin J Burgess,et al.  Can genetic programming improve software effort estimation? A comparative evaluation , 2001, Inf. Softw. Technol..

[29]  Martin J. Shepperd,et al.  Search Heuristics, Case-based Reasoning And Software Project Effort Prediction , 2002, GECCO.

[30]  Sun-Jen Huang,et al.  The adjusted analogy-based software effort estimation based on similarity distances , 2007, J. Syst. Softw..

[31]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[32]  Magne Jørgensen,et al.  Software effort estimation by analogy and "regression toward the mean" , 2003, J. Syst. Softw..

[33]  Magne Jørgensen,et al.  Evidence-based guidelines for assessment of software development cost uncertainty , 2005, IEEE Transactions on Software Engineering.

[34]  Ioannis Stamelos,et al.  A Simulation Tool for Efficient Analogy Based Cost Estimation , 2000, Empirical Software Engineering.

[35]  Sun-Jen Huang,et al.  Optimization of analogy weights by genetic algorithm for software effort estimation , 2006, Inf. Softw. Technol..

[36]  Sheng-Tun Li,et al.  Predicting financial activity with evolutionary fuzzy case-based reasoning , 2009, Expert Syst. Appl..

[37]  Jae Kyu Lee,et al.  Least modification principle for case-based reasoning: a software project planning experience , 2006, Expert Syst. Appl..

[38]  Emilia Mendes,et al.  A Comparative Study of Cost Estimation Models for Web Hypermedia Applications , 2003, Empirical Software Engineering.

[39]  Kyoung-jae Kim,et al.  Global optimization of case-based reasoning for breast cytology diagnosis , 2009, Expert Syst. Appl..

[40]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[41]  Magne Jørgensen,et al.  Forecasting of software development work effort: Evidence on expert judgement and formal models , 2007 .