Generalization and theory-building in software engineering research

The main purpose of this paper is to generate discussions which may improve how we conduct empirical software engineering studies. Our position is that statistical hypothesis testing plays a too large role in empirical software engineering studies. The problems of applying statistical hypothesis testing in empirical software engineering studies is illustrated by the finding: Only 3 out of the 47 studies in Journal of Empirical Software Engineering which applied statistical hypothesis testing, were able to base their statistical testing on well-defined populations and random samples from those populations. The frequent use of statistical hypothesis testing may also have had unwanted consequences on the study designs, e.g., it may have contributed to a too low focus on theory building. We outline several steps we believe are useful for a change in focus from “generalizing from a random sample to a larger population” to “generalizing across populations through theory-building”.