A Ranking Stability Indicator for Selecting the Best Effort Estimator in Software Cost Estimation

Software effort estimation research shows that there is no universal agreement on the "best" effort estimation approach. This is largely due to the "ranking instability" problem, which is highly contingent on the evaluation criteria and the subset of the data used in the investigation. There are a large number of different method combination exists for software effort estimation, selecting the most suitable combination becomes the subject of research in this paper. Unless we can reasonably determine stable rankings of different estimators, we cannot determine the most suitable estimator for effort estimation. This pa- per reports an empirical study using 90 estimation methods applied to 20 datasets as an attempt to address this question. Performance was assessed using MAR, MMRE, MMER, MBRE, MIBRE, MdMRE, PRED(25) and compared using a Wilcoxon ranked test (95%). An comprehensive empirical experiment was carried out. Result shows prior studies of rank- ing instability of effort estimation approaches may have been overly pessimistic. Given the large number of datasets, it is now possible to draw stable conclusions about the relative performance of different effort estimation methods and to select the most suitable ones for the study under investigation. In this study, regression trees or analogy-based methods are the best performers in the experiment, and we recommend against neural nets or simple lin- ear regression. Based on the proposed evaluation method, we are able to determine the most suitable local estimator for software cost estimation, an important process in the application of any effort estimation analysis.

[1]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[2]  Stefan Biffl,et al.  Optimal project feature weights in analogy-based cost estimation: improvement and limitations , 2006, IEEE Transactions on Software Engineering.

[3]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[4]  Ayse Basar Bener,et al.  A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain , 2010, Software Quality Journal.

[5]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[6]  D. Ross Jeffery,et al.  Analogy-X: Providing Statistical Inference to Analogy-Based Software Cost Estimation , 2008, IEEE Transactions on Software Engineering.

[7]  Ingunn Myrtveit,et al.  Reliability and validity in comparative studies of software prediction models , 2005, IEEE Transactions on Software Engineering.

[8]  Guilherme Horta Travassos,et al.  Cross versus Within-Company Cost Estimation Studies: A Systematic Review , 2007, IEEE Transactions on Software Engineering.

[9]  Barbara A. Kitchenham,et al.  Experiments with Analogy-X for Software Cost Estimation , 2008, 19th Australian Conference on Software Engineering (aswec 2008).

[10]  Martin Shepperd,et al.  Case and Feature Subset Selection in Case-Based Software Project Effort Prediction , 2003 .

[11]  Daniel Ryan Baker,et al.  A Hybrid Approach to Expert and Model Based Effort Estimation , 2007 .

[12]  Martin J. Shepperd,et al.  Comparing Software Prediction Techniques Using Simulation , 2001, IEEE Trans. Software Eng..

[13]  Martin Shepperd,et al.  On configuring a case-based reasoning software project prediction system , 2000 .

[14]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[15]  Magne Jørgensen,et al.  Practical Guidelines for Expert-Judgment-Based Software Effort Estimation , 2005, IEEE Softw..

[16]  Colin Robson,et al.  Real World Research: A Resource for Social Scientists and Practitioner-Researchers , 1993 .

[17]  Jacky W. Keung,et al.  Empirical evaluation of analogy-x for software cost estimation , 2008, ESEM '08.

[18]  Uri Lipowezky Selection of the optimal prototype subset for 1-NN classification , 1998, Pattern Recognit. Lett..

[19]  Jingzhou Li,et al.  Decision Support Analysis for Software Effort Estimation by Analogy , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[20]  Martin J. Shepperd,et al.  Making inferences with small numbers of training sets , 2002, IEE Proc. Softw..

[21]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[22]  João Gama,et al.  Discretization from data streams: applications to histograms and data mining , 2006, SAC.

[23]  Emilia Mendes,et al.  A Comparative Study of Cost Estimation Models for Web Hypermedia Applications , 2003, Empirical Software Engineering.

[24]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[25]  Günther Ruhe,et al.  A comparative study of attribute weighting heuristics for effort estimation by analogy , 2006, ISESE '06.

[26]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[27]  Thong Ngee Goh,et al.  A study of project selection and feature weighting for analogy based software cost estimation , 2009, J. Syst. Softw..

[28]  Claes Wohlin,et al.  Distribution patterns of effort estimations , 2004, Proceedings. 30th Euromicro Conference, 2004..

[29]  C. Wohlin,et al.  Distribution patterns of effort estimations , 2004 .

[30]  Tim Menzies,et al.  Case-based reasoning vs parametric models for software quality optimization , 2010, PROMISE '10.

[31]  Karen T. Lum,et al.  Stable rankings for different effort models , 2010, Automated Software Engineering.

[32]  Ying Yang,et al.  A comparative study of discretization methods for naive-Bayes classifiers , 2002 .

[33]  D. Ross Jeffery,et al.  An Empirical Study of Analogy-based Software Effort Estimation , 1999, Empirical Software Engineering.

[34]  Barbara A. Kitchenham,et al.  Effort estimation using analogy , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[35]  Jack P. C. Kleijnen Sensitivity analysis and related analysis : A survey of statistical techniques , 1995 .

[36]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[37]  Magne Jørgensen,et al.  A review of studies on expert estimation of software development effort , 2004, J. Syst. Softw..