论文信息 - Using blind analysis for software engineering experiments - 字舞流文

Using blind analysis for software engineering experiments

Context: In recent years there has been growing concern about conflicting experimental results in empirical software engineering. This has been paralleled by awareness of how bias can impact research results. Objective: To explore the practicalities of blind analysis of experimental results to reduce bias. Method: We apply blind analysis to a real software engineering experiment that compares three feature weighting approaches with a naïve benchmark (sample mean) to the Finnish software effort data set. We use this experiment as an example to explore blind analysis as a method to reduce researcher bias. Results: Our experience shows that blinding can be a relatively straightforward procedure. We also highlight various statistical analysis decisions which ought not be guided by the hunt for statistical significance and show that results can be inverted merely through a seemingly inconsequential statistical nicety (i.e., the degree of trimming). Conclusion: Whilst there are minor challenges and some limits to the degree of blinding possible, blind analysis is a very practical and easy to implement method that supports more objective analysis of experimental results. Therefore we argue that blind analysis should be the norm for analysing software engineering experiments.

Martin J. Shepperd | Boyce Sigweni | M. Shepperd | Boyce Sigweni

[1] Morten W Fagerland,et al. The Wilcoxon–Mann–Whitney test under scrutiny , 2009, Statistics in medicine.

[2] Tore Dybå,et al. Incorrect results in software engineering experiments: How to improve research practices , 2016, J. Syst. Softw..

[3] Martin J. Shepperd,et al. Search Heuristics, Case-based Reasoning And Software Project Effort Prediction , 2002, GECCO.

[4] Frank E. Harrell,et al. A new distribution-free quantile estimator , 1982 .

[5] E Aprile,et al. Dark matter results from 225 live days of XENON100 data. , 2012, Physical review letters.

[6] Jacob Cohen,et al. A power primer. , 1992, Psychological bulletin.

[7] Martin J. Shepperd,et al. Feature weighting techniques for CBR in software effort estimation studies: a review and empirical evaluation , 2014, PROMISE.

[8] Paul D. Ellis,et al. The essential guide to effect sizes : statistical power, meta-analysis, and the interpretation of research results , 2010 .

[9] Rand R. Wilcox. Pairwise comparisons of dependent groups based on medians , 2006, Comput. Stat. Data Anal..

[10] Tim Menzies,et al. Special issue on repeatable results in software engineering prediction , 2012, Empirical Software Engineering.

[11] R. Rosenthal. The file drawer problem and tolerance for null results , 1979 .

[12] Magne Jørgensen,et al. A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[13] Martin Shepperd,et al. Case and Feature Subset Selection in Case-Based Software Project Effort Prediction , 2003 .

[14] J. Ioannidis. Why Most Published Research Findings Are False , 2005, PLoS medicine.

[15] K. Dickersin. The existence of publication bias and risk factors for its occurrence. , 1990, JAMA.

[16] Yong Hu,et al. Systematic literature review of machine learning based software development effort estimation models , 2012, Inf. Softw. Technol..

[17] Julia Kastner,et al. Introduction to Robust Estimation and Hypothesis Testing , 2005 .

[18] Martin J. Shepperd,et al. Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[19] C. Sitthi-amorn,et al. Bias , 1993, The Lancet.

[20] Tracy Hall,et al. Researcher Bias: The Use of Machine Learning in Software Defect Prediction , 2014, IEEE Transactions on Software Engineering.

[21] Stephen G. MacDonell,et al. What accuracy statistics really measure , 2001, IEE Proc. Softw..

[22] Stephen G. MacDonell,et al. Evaluating prediction systems in software project estimation , 2012, Inf. Softw. Technol..

[23] Barbara A. Kitchenham,et al. A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[24] B. Efron,et al. A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[25] Rand Wilcox. Chapter 10 – Robust Regression , 2012 .

[26] E. Ziegel. Introduction to Robust Estimation and Hypothesis Testing (2nd ed.) , 2005 .

[27] P. Williamson,et al. Bias in meta‐analysis due to outcome variable selection within studies , 2000 .

[28] Boyce Sigweni. Feature weighting for case-based reasoning software project effort estimation , 2014, EASE '14.

[29] Tore Dyb,et al. Incorrect results in software engineering experiments , 2016 .