The results from an interlaboratory evaluation are said to be statistically consistent if they fit a normal (Gaussian) consistency model which postulates that the results have the same unknown expected value and stated variances–covariances. A modern method for checking the fit of a statistical model to the data is posterior predictive checking, which is a Bayesian adaptation of classical hypothesis testing. In this paper we propose the use of posterior predictive checking to check the fit of the normal consistency model to interlaboratory results. If the model fits reasonably then the results may be regarded as statistically consistent. The principle of posterior predictive checking is that the realized results should look plausible under a posterior predictive distribution. A posterior predictive distribution is the conditional distribution of potential results, given the realized results, which could be obtained in contemplated replications of the interlaboratory evaluation under the statistical model. A systematic discrepancy between potential results obtained from the posterior predictive distribution and the realized results indicates a potential failing of the model. One can investigate any number of potential discrepancies between the model and the results. We discuss an overall measure of discrepancy for checking the consistency of a set of interlaboratory results. We also discuss two sets of unilateral and bilateral measures of discrepancy. A unilateral discrepancy measure checks whether the result of a particular laboratory agrees with the statistical consistency model. A bilateral discrepancy measure checks whether the results of a particular pair of laboratories agree with each other. The degree of agreement is quantified by the Bayesian posterior predictive p-value. The unilateral and bilateral measures of discrepancy and their posterior predictive p-values discussed in this paper apply to both correlated and independent interlaboratory results. We suggest that the posterior predicative p-values may be used to assess unilateral and bilateral degrees of agreement in International Committee of Weights and Measures (CIPM) key comparisons.
[1]
David B. Dunson,et al.
Bayesian Data Analysis
,
2010
.
[2]
Raghu N. Kacker,et al.
Classical and Bayesian interpretation of the Birge test of consistency and its generalized version for correlated results from interlaboratory evaluations
,
2008
.
[3]
N. Draper,et al.
Applied Regression Analysis.
,
1967
.
[4]
F. N. David,et al.
LINEAR STATISTICAL INFERENCE AND ITS APPLICATION
,
1967
.
[5]
Raghu N. Kacker,et al.
SHORT COMMUNICATION: Response to comments on 'Statistical analysis of CIPM key comparisons based on the ISO Guide'
,
2004
.
[6]
Norman R. Draper,et al.
Applied regression analysis (2. ed.)
,
1981,
Wiley series in probability and mathematical statistics.
[7]
M. Cox.
The evaluation of key comparison data
,
2002
.
[8]
R. T. Birge,et al.
The Calculation of Errors by the Method of Least Squares
,
1932
.
[9]
B. Taylor,et al.
Determination of eh, Using Macroscopic Quantum Phase Coherence in Superconductors: Implications for Quantum Electrodynamics and the Fundamental Physical Constants
,
1969
.
[10]
D. Harville.
Matrix Algebra From a Statistician's Perspective
,
1998
.
[11]
C. R. Rao,et al.
Linear Statistical Inference and its Applications
,
1968
.
[12]
S. R. Searle.
Linear Models
,
1971
.