Reliability in Content Analysis: Some Common Misconceptions and Recommendations

In a recent article in this journal, Lombard, Snyder-Duch, and Bracken (2002) surveyed 200 content analyses for their reporting of reliability tests, compared the virtues and drawbacks of five popular reliability measures, and proposed guidelines and standards for their use. Their discussion revealed that numerous misconceptions circulate in the content analysis literature regarding how these measures behave and can aid or deceive content analysts in their effort to ensure the reliability of their data. This article proposes three conditions for statistical measures to serve as indices of the reliability of data and examines the mathematical structure and the behavior of the five coefficients discussed by the authors, as well as two others. It compares common beliefs about these coefficients with what they actually do and concludes with alternative recommendations for testing reliability in content analysis and similar data-making efforts.

[1]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[2]  R. Alpert,et al.  Communications Through Limited-Response Questioning , 1954 .

[3]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[4]  C. E. Osgood,et al.  The representational model and relevant research materials , 1959 .

[5]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[6]  J. Guilford,et al.  A Note on the G Index of Agreement , 1964 .

[7]  O. Holsti Content Analysis for the Social Sciences and Humanities , 1969 .

[8]  A. E. Maxwell Comparing the Classification of Subjects by Two Independent Judges , 1970, British Journal of Psychiatry.

[9]  K. Krippendorff Bivariate Agreement Coefficients for Reliability of Data , 1970 .

[10]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[11]  J. Fleiss Measuring agreement between two judges on the presence or absence of a trait. , 1975, Biometrics.

[12]  K. Krippendorff Quantitative guidelines for communicable disease control programs. , 1978, Biometrics.

[13]  J. Vegelius,et al.  On Generalizations Of The G Index And The Phi Coefficient To Nominal Scales. , 1979, Multivariate behavioral research.

[14]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[15]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[16]  Dale J. Prediger,et al.  Coefficient Kappa: Some Uses, Misuses, and Alternatives , 1981 .

[17]  Robert T. Craig Generalization of Scott's Index of Intercoder Agreement , 1981 .

[18]  R. Zwick,et al.  Another look at interrater agreement. , 1988, Psychological bulletin.

[19]  W. D. Perreault,et al.  Reliability of Nominal Data Based on Qualitative Judgments , 1989 .

[20]  Marie Adele Hughes,et al.  Intercoder Reliability Estimation Approaches in Marketing: A Generalizability Theory Framework for Quantitative Data , 1990 .

[21]  W. James Potter,et al.  Rethinking validity and reliability in content analysis , 1999 .

[22]  Kimberly A. Neuendorf,et al.  The Content Analysis Guidebook , 2001 .

[23]  M. Lombard,et al.  Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability , 2002 .