Characterisation of the representativity of selected sets of samples in multivariate calibration and pattern recognition

Abstract Whenever some samples are extracted from a larger population of samples, the representativity of the extracted set towards the original population should be achieved. Two statistical tests are proposed, to compare two data sets, and estimate their representativity. The first one is the comparison of the variance-covariance matrices of the two data sets: their equality implies that both data sets have the same direction in space, and that the spread of the data points around the mean is similar. Then, the Mahalanobis distance between the centroids of the two sets is calculated, in order to know whether the centroids have the same position. The presented results show that these tests, when applied together, can be used as a diagnostic for the determination of representativity.