On Average Deviation Indices for Estimating Interrater Agreement

In this investigation, the authors report the results of two studies designed to investigate the efficacy of two proposed indices of interrater agreement based on average deviations from the mean and from the median (ADM and ADMd, respectively). Using survey response data collected from 6,549 sales employees in 119 stores of a national retail company, Study 1 compared the results of six interrater agreement indices across four types of Likert-type response scales (i.e., 5-, 6-, 7-, and 11-point scales). The results indicated that the AD indices were highly correlated with an index of proportional agreement and with within-group interrater agreement indices. Study 2, based on survey data collected from 4,158 sales employees in 109 other stores of this company, constructively replicated Study 1 and examined the consistency of interrater agreement decisions across six indices with respect to a priori decision rules. Study 2 results also supported the use of AD indices. Practical issues concerning the use of AD indices for estimating interrater agreement and future research directions are discussed.

[1]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[2]  A. Greenwald Consequences of Prejudice Against the Null Hypothesis , 1975 .

[3]  J. Webster,et al.  RESEARCH NOTES. PATTERNS OF INTEREST AROUND ISSUES: THE ROLE OF UNCERTAINTY AND FEASIBILITY. , 1988 .

[4]  Jennifer M. George,et al.  Personality, affect, and behavior in groups. , 1990 .

[5]  R A Berk,et al.  Generalizability of behavioral observations: a clarification of interobserver agreement and interobserver reliability. , 1979, American journal of mental deficiency.

[6]  Allan P. Jones,et al.  Apples and Oranges: An Empirical Comparison of Commonly Used Indices of Interrater Agreement , 1983 .

[7]  J. George,et al.  Understanding prosocial behavior, sales performance, and turnover: A group-level analysis in a service context. , 1990 .

[8]  L. James,et al.  Estimating within-group interrater reliability with and without response bias. , 1984 .

[9]  John E. Hunter,et al.  Interrater reliability coefficients cannot be computed when only one stimulus is rated. , 1989 .

[10]  S. Kozlowski,et al.  A disagreement about within-group agreement: Disentangling issues of consistency versus consensus. , 1992 .

[11]  C. Parsons,et al.  Psychosomatic Complaints Scale of Stress: Measure Development and Psychometric Properties , 1988 .

[12]  G F Lawlis,et al.  Judgment of counseling process: reliability, agreement, and error. , 1972, Psychological bulletin.

[13]  Daniel J. Svyantek,et al.  Analyzing meta-analysis: Potential problems, an unsuccessful replication, and evaluation criteria. , 1985 .

[14]  R. Dunsmore,et al.  Statistical Analysis: A Decision-Making Approach. , 1979 .

[15]  Gina J. Medsker,et al.  RELATIONS BETWEEN WORK GROUP CHARACTERISTICS AND EFFECTIVENESS: IMPLICATIONS FOR DESIGNING EFFECTIVE WORK GROUPS , 1993 .

[16]  Robert Parsons Statistical Analysis: A Decision-Making Approach , 1974 .

[17]  J. Dutton PATTERNS OF INTEREST AROUND ISSUES: THE ROLE OF UNCERTAINTY AND FEASIBILITY , 1988 .

[18]  R. H. Finn A Note on Estimating the Reliability of Categorical Data , 1970 .

[19]  B. Bass,et al.  Superiors' evaluations and subordinates' perceptions of transformational and transactional leadership. , 1988 .

[20]  Steve W. J. Kozlowski,et al.  AN EXPLORATION OF CLIMATES FOR TECHNICAL UPDATING AND PERFORMANCE , 1987 .

[21]  John E. Mathieu,et al.  GENERALIZATION OF EMPLOYEE INVOLVEMENT TRAINING TO THE JOB SETTING: INDIVIDUAL AND SITUATIONAL EFFECTS , 1995 .

[22]  L. James,et al.  rwg: An assessment of within-group interrater agreement. , 1993 .

[23]  Jeffrey K. Pinto,et al.  Appropriate Moderated Regression and Inappropriate Research Strategy: A Demonstration of Information Loss Due to Scale Coarseness , 1991 .

[24]  W. Hoeffding,et al.  Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling , 1961 .

[25]  D. Weiss,et al.  Interrater reliability and agreement of subjective judgments , 1975 .

[26]  Michael J. Burke,et al.  Reconceptualizing psychological climate in a retail service environment: A multiple-stakeholder perspective. , 1992 .

[27]  Michael K. Lindell,et al.  Measuring Interrater Agreement for Ratings of a Single Target , 1997 .

[28]  Lawrence R. James,et al.  Personality, affect, and behavior in groups revisited: Comment on aggregation, levels of analysis, and a recent application of within and between analysis. , 1993 .

[29]  Michael J. Burke,et al.  Do situational variables act as substantive causes of relationships between individual difference variables? Two large-scale tests of "common cause" models. , 1996 .

[30]  J. Tukey A survey of sampling from contaminated distributions , 1960 .

[31]  D. Watson,et al.  The vicissitudes of mood measurement: effects of varying descriptors, time frames, and response formats on measures of positive and negative affect. , 1988, Journal of personality and social psychology.