Interrater reliability and agreement of subjective judgments

Indexes of interrater reliability and agreement are reviewed and suggestions are made regarding their use in counseling psychology research. The distinction between agreement and reliability is clarified and the relationships between these indexes and the level of measurement and type of replication are discussed. Indexes of interrater reliability appropriate for use with ordinal and interval scales are considered. The intraclass correlation as a measure of interrater reliability is discussed in terms of the treatment of between-raters variance and the appropriateness of reliability estimates based on composite or individual ratings. The advisability of optimal weighting schemes for calculating composite ratings is also considered. Measures of interrater agreement for ordinal and interval scales are described, as are measures of interrater agreement for data at the nominal level of measurement.

[1]  Objectivity as a criterion for estimating the validity of questionnaire data. , 1935 .

[2]  E. L. Clark,et al.  Spearman-Brown formula applied to ratings of personality traits. , 1935 .

[3]  A. Rosander The Spearman-Brown formula in attitude scale construction. , 1936 .

[4]  R. L. Ebel,et al.  Estimation of the reliability of ratings , 1951 .

[5]  A note on the combination of ratings on the basis of reliability. , 1952, Psychological bulletin.

[6]  J. Guilford Psychometric methods, 2nd ed. , 1954 .

[7]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[8]  W. S. Robinson The statistical measurement of agreement. , 1957 .

[9]  L. Cronbach,et al.  Psychological tests and personnel decisions , 1958 .

[10]  Max D. Engelhart A Method of Estimating the Reliability of Ratings Compared with Certain Methods of Estimating the Reliability of Tests , 1959 .

[11]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[12]  W. Hays Statistics for psychologists , 1963 .

[13]  Joseph L. Fleiss,et al.  A NEW VIEW OF INTER‐OBSERVER AGREEMENT , 1963 .

[14]  Reliability of Composite Ratings1 , 1965 .

[15]  Note on Interjudge Reliability , 1966, Psychological reports.

[16]  Curtis D. Hardyck,et al.  Weak Measurements vs. Strong Statistics: An Empirical Critique of S. S. Stevens' Proscriptions nn Statistics , 1966 .

[17]  J. Bartko The Intraclass Correlation Coefficient as a Measure of Reliability , 1966, Psychological reports.

[18]  G. W. Snedecor STATISTICAL METHODS , 1967 .

[19]  J. Schulman,et al.  A Study of the Inter-Judge Reliability in Scoring the Responses of a Group of Mentally Retarded Boys to Three Wisc Subscales , 1968 .

[20]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[21]  James B. Taylor Rating Scales as Measures of Clinical Judgment: a Method for Increasing Scale Reliability and Sensitivity , 1968 .

[22]  Brian Everitt,et al.  MOMENTS OF THE STATISTICS KAPPA AND WEIGHTED KAPPA , 1968 .

[23]  Lee J. Cronbach,et al.  Psychological tests and personnel decisions , 1958 .

[24]  R. Carkhuff Helper Communication as a Function of Helpee Affect and Content. , 1969 .

[25]  J. Cannon,et al.  Effects of rater level of functioning and experience upon the discrimination of facilitative conditions. , 1969, Journal of consulting and clinical psychology.

[26]  B. Everitt,et al.  Large sample standard errors of kappa and weighted kappa. , 1969 .

[27]  R. Carkhuff,et al.  Training as a Preferred Mode of Facilitating Relations between Races and Generations. , 1970 .

[28]  R. H. Finn A Note on Estimating the Reliability of Categorical Data , 1970 .

[29]  R. Carkhuff,et al.  The Selection and Training of Human Relations Specialists. , 1970 .

[30]  David J. Weiss,et al.  Factor Analysis and Counseling Research. , 1970 .

[31]  D. G. Martin,et al.  A Method of Self-Evaluation for Counselor Education Utilizing the Measurement of Facilitative Condition. , 1970 .

[32]  Variability of Outcome in Psychotherapeutic Research. , 1970 .

[33]  Klaus Krippendorff,et al.  Estimating the Reliability, Systematic Error and Random Error of Interval Data , 1970 .

[34]  Douglas J. Mickelson,et al.  Differential Effects of Facilitative and Nonfacilitative Behavioral Counselors. , 1971 .

[35]  R. Myrick,et al.  A Study of the Effects of Group Sensitivity Training with Student Counselor-Consultants. , 1971 .

[36]  David J. Weiss,et al.  Further Considerations in Applications of Factor Analysis. , 1971 .

[37]  A. Bergin,et al.  Evaluation of outcome in psychotherapy. , 1971, Journal of consulting and clinical psychology.

[38]  J. Wittmer An Objective Scale for Content-Analysis of the Counselor's Interview Behavior , 1971 .

[39]  Effects of Short-Term Training Upon Accurate Empathy and Non-Possessive Warmth , 1971 .

[40]  J. Sigal,et al.  Reliability of coding affective communication in family therapy sessions. Problems of measurement and interpretation. , 1971, Journal of consulting and clinical psychology.

[41]  Ruth Scheeffer Toward effective counseling and psychotherapy , 1971 .

[42]  Principles of Social Action in Training for New Careers in Human Services. , 1971 .

[43]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[44]  K. H. Lu A Measure of Agreement Among Subjective Judgments , 1971 .

[45]  Toward the Development of Facilitative Counselors: The Effects of Practicum Instruction and Individual Supervision , 1971 .

[46]  G F Lawlis,et al.  Judgment of counseling process: reliability, agreement, and error. , 1972, Psychological bulletin.

[47]  R. McMullin Effects of Counselor Focusing On Client Self-Experiencing Under Low Attitudinal Conditions. , 1972 .

[48]  R. H. Finn Effects of Some Variations in Rating Scale Characteristics on the Means and Reliabilities of Ratings , 1972 .

[49]  P. Payne,et al.  Effects of Supervisor Style on the Learning of Empathy in a Supervision Analogue. , 1972 .

[50]  C. Gelso,et al.  Effect Of Recording On Clients. , 1972 .

[51]  M. Tseng Self-Perception And Employability: A Vocational Rehabilitation Problem. , 1972 .

[52]  S. Stillman,et al.  Does Counselor Attire Matter , 1972 .

[53]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .