Experiments with Analogy-X for Software Cost Estimation

We developed a novel method called Analogy-X to provide statistical inference procedures for analogy- based software effort estimation. Analogy-X is a method to statistically evaluate the relationship between useful project features and target features such as effort to be estimated, which ensures the dataset used is relevant to the prediction problem, and project features are selected based on their statistical contribution to the target variables. We hypothesize that this method can be (1) easily applied to a much larger dataset, and (2) also it can be used for incorporating joint effort and duration estimation into analogy, which was not previously possible with conventional analogy estimation. To test these two hypotheses, we conducted two experiments using different datasets. Our results show that Analogy-X is able to deal with ultra large datasets effectively and provides useful statistics to assess the quality of the dataset. In addition, our results show that feature selection for duration estimation differs from feature selection for joint-effort duration estimation. We conclude Analogy-X allows users to assess the best procedure for estimating duration given their specific requirements and dataset.

[1]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[2]  D. Ross Jeffery,et al.  Using public domain metrics to estimate software development effort , 2001, Proceedings Seventh International Software Metrics Symposium.

[3]  Bryan F. J. Manly,et al.  Multivariate Statistical Methods : A Primer , 1986 .

[4]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[5]  D. Ross Jeffery,et al.  Analogy-X: Providing Statistical Inference to Analogy-Based Software Cost Estimation , 2008, IEEE Transactions on Software Engineering.

[6]  Barbara A. Kitchenham,et al.  Effort estimation using analogy , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[7]  Jacky Keung Providing statistical inferences to case-based software cost estimation , 2007 .

[8]  Martin Shepperd,et al.  Using Simulation to Evaluate Prediction Techniques , 2001 .

[9]  F. J. Heemstra,et al.  Software cost estimation , 1992, Inf. Softw. Technol..

[10]  F. Marriott,et al.  Barnard's Monte Carlo Tests: How Many Simulations? , 1979 .

[11]  Shari Lawrence Pfleeger,et al.  An empirical study of maintenance and development estimation accuracy , 2002, J. Syst. Softw..

[12]  B. Manly Randomization, Bootstrap and Monte Carlo Methods in Biology , 2018 .

[13]  Iain D. Craig Cost Estimation For Software Development by Bernard Londeix, Addison-Wesley, Wokingham, UK, 1987, 214 pages (incl. index) (£16.95) , 1989, Robotica.

[14]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[15]  N. Mantel The detection of disease clustering and a generalized regression approach. , 1967, Cancer research.