Design and use of reference data sets for testing scientific software

Abstract A general methodology for evaluating the accuracy of the results produced by scientific software has been developed at the National Physical Laboratory. The basis of the approach is the design and use of reference data sets and corresponding reference results to undertake black-box testing. The approach enables reference data sets and results to be generated in a manner consistent with the functional specification of the problem addressed by the software. The results returned by the software for the reference data are compared objectively with the reference results. Quality metrics are used for this purpose that account for the key aspects of the problem. In this paper it is shown how reference data sets can be designed for testing software implementations of solutions to a broad class of problems arising throughout science. It is shown how these data sets can be used in practice and how the results provided by software under test can properly be compared with reference results. The approach is illustrated with three examples: (i) mean and standard deviation, (ii) straight-line fitting, and (iii) principal components analysis. Software for such problems is used routinely in many fields, including optical spectrometry.