Measuring Interrater Agreement for Ratings of a Single Target

Researchers assessing interrater agreement for ratings of a single target have increasingly used the rWG(j) index, but have found it can display irregular behavior. Mathematical analyses show this problem arises from the use of random response, operationalized by the variance of a uniform distribution (sEU), for the baseline of comparison. These analyses suggest that researchers should continue to use rWG(j) as a summary measure of interrater agreement, but should use maximum dissensus as a reference distribution for computing rWG(j). Although values of s2 can be descriptively misleading, they provide an important inferential baseline. Thus, sEU should be used in computing x2 tests of the departure of the observed response variance from random responding. Researchers should also examine interrater agreement as a theoretical variable in its own right, investigating the causes and consequences of rater dissensus.