In Reply. —Dr Berry appears to suggest that accuracy should be used in studies such as mine to evaluate the scientific usefulness of peer assessments. However, the calculation of accuracy or other measures of validity requires a "gold standard" against which these ratings can be measured. No such gold standard currently exists; instead, as discussed in my article, peer assessment is typically used as the standard against which the validity of other measures of quality is evaluated. Thus, the scientific value of peer ratings can only be studied by measuring agreement among reviewers. The absence of a gold standard is typical of studies in which observer variability is assessed through the use of κ. The only question usually facing investigators in these studies is what measure of agreement to use, not whether to measure agreement or accuracy. Berry also cites the often-noted 1-5 relationship between prevalence and κ values as
[1]
A. Feinstein,et al.
High agreement but low kappa: I. The problems of two paradoxes.
,
1990,
Journal of clinical epidemiology.
[2]
A. Feinstein,et al.
High agreement but low kappa: II. Resolving the paradoxes.
,
1990,
Journal of clinical epidemiology.
[3]
H. Kraemer.
Ramifications of a population model forκ as a coefficient of reliability
,
1979
.
[4]
J. Fleiss,et al.
Quantification of agreement in psychiatric diagnosis revisited.
,
1987,
Archives of general psychiatry.
[5]
N. Andreasen,et al.
Reliability studies of psychiatric diagnosis. Theory and practice.
,
1981,
Archives of general psychiatry.