Establishing interrater reliability of instruments is an important issue in nursing research and practice. Morris et al.’s (2008) paper highlights the problem of choosing the appropriate statistical approach for interrater reliability data analysis and the authors raise the important and relevant question how to interpret kappa-like statistics like Cohen’s (k) or weighted kappa (kw). It is true that the often called ‘chance corrected’ k was frequently criticised because its value is dependent on the prevalence of the rated trait in the sample (‘base rate problem’). Consequently, even if two raters nearly or exactly agree, k-coefficients are nearly or equal to 0 if the prevalence of the rated characteristic is very high or very low. This objects the natural expectation that interrater reliability must be high as well. However, this is neither a limitation nor a ‘‘main drawback’’ (p. 646). In fact it is a desired property, because k-coefficients are classical interrater reliability coefficients (Dunn, 2004; Kraemer et al., 2002; Landis and Koch, 1975). In the classical test theory, reliability is defined as the ratio of variability between subjects (or targets) to the total variability. The total variability is the sum of subject (target) variability and the measurement error (Dunn, 2004; Streiner and Norman, 2003). Consequently, if the variance between the subjects is very small or even zero the reliability coefficient would be near zero as well. Therefore, reliability coefficients do not only reflect the degree of agreement between raters, but also the degree to which a
[1]
J. Fleiss.
Statistical methods for rates and proportions
,
1974
.
[2]
G.,et al.
A review of statistical methods in the analysis of data arising from observer reliability studies (Part II)*
,
2007
.
[3]
M. Szklo,et al.
Epidemiology: Beyond the Basics
,
1999
.
[4]
A. Scott,et al.
Ambiguities and conflicting results: the limitations of the kappa statistic in establishing the interrater reliability of the Irish nursing minimum data set for mental health: a discussion paper.
,
2008,
International journal of nursing studies.
[5]
P. Shrout.
Measurement reliability and agreement in psychiatry
,
1998,
Statistical methods in medical research.
[6]
Graham Dunn,et al.
Statistical Evaluation of Measurement Errors: Design and Analysis of Reliability Studies
,
2004
.
[7]
H. Kraemer.
Ramifications of a population model forκ as a coefficient of reliability
,
1979
.
[8]
J. Fleiss,et al.
Statistical methods for rates and proportions
,
1973
.
[9]
D. Streiner,et al.
Health Measurement Scales: A practical guide to thier development and use
,
1989
.
[10]
Art Noda,et al.
Kappa coefficients in medical research
,
2002,
Statistics in medicine.