Impact Analysis of Missing Values on the Prediction Accuracy of Analogy-based Software Effort Estimation Method AQUA

Effort estimation by analogy (EBA) is often confronted with missing values. Our former analogy- based method AUQA is able to tolerate missing values in the data set, but it is unclear how the percentage of missing values impacts the prediction accuracy and if there is an upper bound for how big this percentage might become in order to guarantee the applicability of AQUA. This paper investigates these questions through an impact analysis. The impact analysis is conducted for seven data sets being of different size and having different initial percentages of missing values. The major results are that (i) we confirm the intuition that the more missing values, the poorer the prediction accuracy of AQUA; (ii) there is a quadratic dependency between the prediction accuracy and the percentage of missing values; and (Hi) the upper limit of missing values for the applicability of AQUA is determined as 40%. These results are obtained in the context of AQUA. Further analysis is necessary for other ways of applying EBA, such as using different similarity measures or analogy adaptation methods from those used in AQUA. For that purpose, the experimental design in this study can be adapted.

[1]  Khaled El Emam,et al.  Software Cost Estimation with Incomplete Data , 2001, IEEE Trans. Software Eng..

[2]  S. Hawker,et al.  Appraising the Evidence: Reviewing Disparate Data Systematically , 2002, Qualitative health research.

[3]  Tore Dybå,et al.  A systematic review of statistical power in software engineering experiments , 2006, Inf. Softw. Technol..

[4]  Amela Karahasanovic,et al.  A survey of controlled experiments in software engineering , 2005, IEEE Transactions on Software Engineering.

[5]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[6]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[7]  James Miller,et al.  Applying meta-analytical procedures to software engineering experiments , 2000, J. Syst. Softw..

[8]  A. Harden,et al.  Methodological Issues in Combining Diverse Study Types in Systematic Reviews , 2005 .

[9]  Ingunn Myrtveit,et al.  Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods , 2001, IEEE Trans. Software Eng..

[10]  Michael M. Richter,et al.  A flexible method for software effort estimation by analogy , 2007, Empirical Software Engineering.

[11]  Qinbao Song,et al.  Dealing with missing software project data , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[12]  Chris F. Kemerer,et al.  An empirical validation of software cost estimation models , 1987, CACM.

[13]  Akito Monden,et al.  Effort Estimation Based on Collaborative Filtering , 2004, PROFES.

[14]  Emilia Mendes,et al.  A Comparative Study of Cost Estimation Models for Web Hypermedia Applications , 2003, Empirical Software Engineering.

[15]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[16]  Tore Dybå,et al.  Evidence-based software engineering , 2004, Proceedings. 26th International Conference on Software Engineering.

[17]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[18]  Douglas G. Altman,et al.  Systematic Reviews in Health Care , 2001 .

[19]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[20]  Mary Dixon-Woods,et al.  Synthesising qualitative and quantitative evidence: a review of possible methods. , 2005, Journal of health services research & policy.

[21]  Tore Dybå,et al.  A Systematic Review of Theory Use in Software Engineering Experiments , 2007, IEEE Transactions on Software Engineering.

[22]  Tore Dybå,et al.  Evidence-Based Software Engineering for Practitioners , 2005, IEEE Softw..

[23]  Michael J. Prietula,et al.  Examining the Feasibility of a Case-Based Reasoning Model for Software Effort Estimation , 1992, MIS Q..

[24]  Carolyn B. Seaman,et al.  Qualitative Methods in Empirical Studies of Software Engineering , 1999, IEEE Trans. Software Eng..

[25]  Günther Ruhe,et al.  A comparative study of attribute weighting heuristics for effort estimation by analogy , 2006, ISESE '06.

[26]  Michael M. Richter,et al.  On the Notion of Similarity in Case-Based Reasoning , 1995 .

[27]  Christopher M. Lott,et al.  Repeatable software engineering experiments for comparing defect-detection techniques , 2004, Empirical Software Engineering.

[28]  Tim Menzies,et al.  The \{PROMISE\} Repository of Software Engineering Databases. , 2005 .

[29]  Shari Lawrence Pfleeger,et al.  Preliminary Guidelines for Empirical Research in Software Engineering , 2002, IEEE Trans. Software Eng..

[30]  C. Pope,et al.  Using meta ethnography to synthesise qualitative research: a worked example , 2002, Journal of health services research & policy.

[31]  Mark Staples,et al.  Experiences using systematic review guidelines , 2006, J. Syst. Softw..