Accurate tests of statistical significance for r(WG) and average deviation interrater agreement indexes.

The authors demonstrated that the most common statistical significance test used with r(WG)-type interrater agreement indexes in applied psychology, based on the chi-square distribution, is flawed and inaccurate. The chi-square test is shown to be extremely conservative even for modest, standard significance levels (e.g., .05). The authors present an alternative statistical significance test, based on Monte Carlo procedures, that produces the equivalent of an approximate randomization test for the null hypothesis that the actual distribution of responding is rectangular and demonstrate its superiority to the chi-square test. Finally, the authors provide tables of critical values and offer downloadable software to implement the approximate randomization test for r(WG)-type and for average deviation (AD)-type interrater agreement indexes. The implications of these results for studying a broad range of interrater agreement problems in applied psychology are discussed.

[1]  M. Burke,et al.  Interpreting the Statistical Significance of Observed AD Interrater Agreement Values: Correction to Burke and Dunlap (2002) , 2003 .

[2]  Michael J. Burke,et al.  Estimating Interrater Agreement with the Average Deviation Index: A User’s Guide , 2002 .

[3]  Jennifer A. Chatman,et al.  The Influence of Demographic Heterogeneity on the Emergence and Consequences of Cooperative Norms in Work Teams , 2001 .

[4]  E Doveh,et al.  Statistical properties of the rWG(J) index of agreement. , 2001, Psychological methods.

[5]  Suzanne S. Masterson,et al.  A trickle-down model of organizational justice: relating employees' and customers' perceptions of and reactions to fairness. , 2001, The Journal of applied psychology.

[6]  A. Bakker,et al.  The job demands-resources model of burnout. , 2001, The Journal of applied psychology.

[7]  Amy Buhl Conn,et al.  Is everyone in agreement? An exploration of within-group agreement in employee perceptions of the work environment. , 2001, The Journal of applied psychology.

[8]  S. B. Button,et al.  Organizational efforts to affirm sexual diversity: a cross-level examination. , 2001, The Journal of applied psychology.

[9]  P. Totterdell Catching moods and hitting runs: mood linkage and subjective performance in professional sport teams. , 2000, The Journal of applied psychology.

[10]  K. Dirks Trust in leadership and team performance: evidence from NCAA basketball. , 2000, The Journal of applied psychology.

[11]  T. Judge,et al.  Five-factor model of personality and transformational leadership. , 2000, The Journal of applied psychology.

[12]  D. Zohar A group-level model of safety climate: testing the effect of group climate on microaccidents in manufacturing jobs. , 2000, The Journal of applied psychology.

[13]  M. Lindell,et al.  Climate quality and climate consensus as mediators of the relationship between organizational antecedents and outcomes. , 2000, The Journal of applied psychology.

[14]  R. Cropanzano,et al.  The effect of organizational structure on perceptions of procedural fairness. , 2000, The Journal of applied psychology.

[15]  S. Kozlowski,et al.  Multilevel Theory, Research, and Methods in Organizations: Foundations, Extensions, and New Directions , 2000 .

[16]  A. Neal,et al.  The impact of organizational climate on safety climate and individual behavior , 2000 .

[17]  S. Kozlowski,et al.  Multilevel Theory, Research, a n d M e t h o d s i n Organizations Foundations, Extensions, and New Directions , 2022 .

[18]  Michael J. Burke,et al.  On Average Deviation Indices for Estimating Interrater Agreement , 1999 .

[19]  M. Lindell,et al.  Assessing interrater agreement on the job relevance of a test: A comparison of CVI, T, r-sub(WG(J)), and r*-sub(WG(J)) indexes. , 1999 .

[20]  D. Chan Functional Relations among Constructs in the Same Content Domain at Different Levels of Analysis: A Typology of Composition Models , 1998 .

[21]  Susan S. White,et al.  Linking service climate and customer perceptions of service quality: test of a causal model. , 1998, The Journal of applied psychology.

[22]  Michael K. Lindell,et al.  Measuring Interrater Agreement for Ratings of a Single Target , 1997 .

[23]  Michael J. Burke,et al.  Do situational variables act as substantive causes of relationships between individual difference variables? Two large-scale tests of "common cause" models. , 1996 .

[24]  John E. Mathieu,et al.  GENERALIZATION OF EMPLOYEE INVOLVEMENT TRAINING TO THE JOB SETTING: INDIVIDUAL AND SITUATIONAL EFFECTS , 1995 .

[25]  L. James,et al.  Estimating within-group interrater reliability with and without response bias. , 1984 .

[26]  N. C. Silver,et al.  Exact multinomial probabilities for one-way contingency tables , 1984 .

[27]  D. Weiss,et al.  Interrater reliability and agreement of subjective judgments , 1975 .

[28]  W. Hays Statistics for the social sciences , 1973 .

[29]  G F Lawlis,et al.  Judgment of counseling process: reliability, agreement, and error. , 1972, Psychological bulletin.