How independent are multiple 'independent' diagnostic classifications?

Multiple performances of diagnostic tests are commonly employed in clinical practice and epidemiologic research to assess test reliability, to increase sensitivity or specificity, or to correct for misclassification bias. An assumption almost universally made in this context is the assumption of independence of test results conditional on the true value. For dichotomous diagnostic tests, for example, disease present or absent, this assumption is usually not tenable, however, since there is typically a continuum of the traits that underly diagnosis and individuals in the vicinity of the (implicit or explicit) diagnostic cutpoint (such as the threshold of clinical detectibility) are more likely to be misclassified than other individuals. This paper assesses the magnitude of the resulting correlation of diagnostic errors as a function of the distribution of the underlying trait, the magnitude of the measurement error and the diagnostic threshold. It is concluded that errors of diagnostic tests can be strongly correlated even if errors in perception of the underlying trait are independent. It is illustrated by numerical examples that such positive correlation of diagnostic errors can substantially inflate commonly employed indices of reliability, such as the kappa coefficient.