Testing software using multiple versions

One aspect of the testing process that has been the subject of little research is the problem of determining the correct output for each test input. For many software systems the determination of correct outputs is a difficult and time-consuming task. Some researchers have suggested that multiple, independently developed versions can be used to obviate the need for the a priori determination of the correct output. The outputs of the versions can be compared, and any differences can be investigated. We call this method comparison testing. Comparison testing is an appealing approach because the testing process can be automated easily if there is no need for the independent determination of outputs. However, the possibility exists that all of the versions could obtain identical incorrect outputs. Thus some test cases that produce failures would not be investigated. The purpose of this research is to evaluate comparison testing. Parametric analytic models have been developed that reveal the effects of fault interrelationships on the ability of comparison testing to reveal a fault. The reliabilities of operational software systems that have been comparison tested are studied. The factors that affect the cost of comparison testing are considered. Empirical evidence from a multi-version experiment is analyzed as an example of model parameter values. We conclude that comparison testing can be used to ensure any required reliability if a sufficient number of versions are used, and may or may not be cost justified for a given application. The components of an N-version programming system probably should not be tested using comparison testing.