Filtering of Inconsistent Software Project Data for Analogy-Based Effort Estimation

Accurate software effort estimation is essential for successful project management. To improve the accuracy, a number of estimation techniques have been developed. Among those, Analogy-Based Estimation (ABE) has become one of the mainstreams of effort estimation. In general, ABE infers the effort to accomplish a new project from the efforts of the historical projects which possess similar characteristics. ABE is simple, yet it can be affected by the noise in historical projects. Noise is generally the data corruptions which may cause negative affect on the performance of a model built on the historical data. In this study, we propose an approach to filtering noise in the historical projects to improve the accuracy of ABE. We introduce and measure the Effort-Inconsistency Degree (EID), the degree that the effort of a historical project is inconsistent from those of its similar projects. Based on EID, we identify and filter the noise in terms of the inconsistent historical project data. We have validated the performance of ABE with our approach and three representative filtering techniques, namely the Edited Nearest Neighbor algorithm, the Univariate Outlier Elimination, and the Genetic Algorithm based project selection, on three software project datasets (Desharnais, Maxwell, and ISBSG (International Software Benchmarking Standards Group) Telecom). The experimental results suggest that our approach can improve the accuracy of ABE more effectively than can the other approaches.

[1]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[2]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[3]  Anthony J. Hayter,et al.  Probability and statistics for engineers and scientists , 1996 .

[4]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[5]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[6]  Katrina D. Maxwell,et al.  Applied Statistics for Software Managers , 2002 .

[7]  Martin Shepperd,et al.  Case-Based Reasoning and Software Engineering , 2003 .

[8]  D. Ross Jeffery,et al.  An Empirical Study of Analogy-based Software Effort Estimation , 1999, Empirical Software Engineering.

[9]  Robert C. Mintram,et al.  Preliminary Data Analysis Methods in Software Estimation , 2005, Software Quality Journal.

[10]  Marek Grochowski,et al.  Comparison of Instances Seletion Algorithms I. Algorithms Survey , 2004, ICAISC.

[11]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[12]  Carolyn Mair,et al.  The consistency of empirical comparisons of regression and analogy-based software project cost prediction , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[13]  Simon C. K. Shiu,et al.  Combining feature reduction and case selection in building CBR classifiers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[14]  Margaret Ross,et al.  Evaluation of preliminary data analysis framework in software cost estimation based on ISBSG R9 Data , 2008, Software Quality Journal.

[15]  Doo-Hwan Bae,et al.  An empirical analysis of software effort estimation with outlier elimination , 2008, PROMISE '08.

[16]  Thong Ngee Goh,et al.  A study of project selection and feature weighting for analogy based software cost estimation , 2009, J. Syst. Softw..