Towards a software evolution benchmark

Case-studies are extremely popular in rapidly evolving research disciplines such as software engineering because they allow for a quick but fair assessment of new techniques. Unfortunately, a proper experimental set-up is rarely the case: all too often case-studies are based on a single small toy-example chosen to favour the technique under study. Such lack of scientific rigor prevents fair evaluation and has serious consequences for the credibility of our field. In this paper, we propose to use a representative set of cases as benchmarks for comparing various techniques dealing with software evolution. We hope that this proposal will launch a consensus building process that eventually must lead to a scientifically sound validation method for researchers investigating reverse- and re-engineering techniques.