The main purpose of this paper is to generate discussions which may improve how we conduct empirical software engineering studies. Our position is that statistical hypothesis testing plays a too large role in empirical software engineering studies. The problems of applying statistical hypothesis testing in empirical software engineering studies is illustrated by the finding: Only 3 out of the 47 studies in Journal of Empirical Software Engineering which applied statistical hypothesis testing, were able to base their statistical testing on well-defined populations and random samples from those populations. The frequent use of statistical hypothesis testing may also have had unwanted consequences on the study designs, e.g., it may have contributed to a too low focus on theory building. We outline several steps we believe are useful for a change in focus from “generalizing from a random sample to a larger population” to “generalizing across populations through theory-building”.
[1]
I. Hacking.
An Introduction to Probability and Inductive Logic
,
2001
.
[2]
Gerd Gigerenzer,et al.
Adaptive Thinking: Rationality in the Real World
,
2000
.
[3]
E. Skovlund,et al.
A critical review of papers from clinical cancer research.
,
1998,
Acta oncologica.
[4]
T. Fine,et al.
The Emergence of Probability
,
1976
.
[5]
Magne Jørgensen,et al.
A review of studies on expert estimation of software development effort
,
2004,
J. Syst. Softw..
[6]
Martin J. Shepperd,et al.
Comparing Software Prediction Techniques Using Simulation
,
2001,
IEEE Trans. Software Eng..
[7]
Shari Lawrence Pfleeger,et al.
Preliminary Guidelines for Empirical Research in Software Engineering
,
2002,
IEEE Trans. Software Eng..