A review of statistical methods in the analysis of data arising from observer reliability studies (Part II)*

Many research designs in studies of observer reliability give rise to categorical data via nomial scales(e.g., states of mental health such as normal, neurosis, and depression) or ordinal scales (e.g., stages of disease such as mild, moderate, and severe). In these situations, each of the d observers classifies each subject once into exactly one of a fixed set of L categories. As such, these designs are directly analogous to those giving rise to the standard ANOVA models in (2.1), (2.5), and (2.10) when the measurement scale is assumed to be quantitative. However, standard ANOVA procedures are rarely appropriate for the analysis of nominal and ordinal scaled data. As a result, these data are usually cross-classified into an Ld contingency table, and can then be analyzed by techniques developed for multidimensional contingency tables.

[1]  R. Loewenson,et al.  Reliability of measurements for studies of cerebrovascular atherosclerosis. , 1972, Biometrics.

[2]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[3]  H. Hartley,et al.  Maximum-likelihood estimation for the mixed analysis of variance model. , 1967, Biometrika.

[4]  W. S. Robinson The statistical measurement of agreement. , 1957 .

[5]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[6]  G. Koch,et al.  The analysis of categorical data from mixed models , 1971 .

[7]  A. E. Maxwell,et al.  Deriving coefficients of reliability and agreement for ratings. , 1968, The British journal of mathematical and statistical psychology.

[8]  L. A. Goodman,et al.  Measures of Association for Cross Classifications. II: Further Discussion and References , 1959 .

[9]  D. Pyke Finger clubbing; validity as a physical sign. , 1954, Lancet.

[10]  D. Sengupta Linear models , 2003 .

[11]  W. A. Thompson,et al.  PRECISION OF SIMULTANEOUS MEASUREMENT PROCEDURES , 1963 .

[12]  G. Koch,et al.  Analysis of categorical data by linear models. , 1969, Biometrics.

[13]  D. Cartwright A rapid non-parametric estimate of multi-judge reliability , 1956 .

[14]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[15]  J. Fleiss,et al.  Quantification of agreement in psychiatric diagnosis. A new approach. , 1967, Archives of general psychiatry.

[16]  Brian Everitt,et al.  MOMENTS OF THE STATISTICS KAPPA AND WEIGHTED KAPPA , 1968 .

[17]  J. Fleiss,et al.  Quantification of agreement in multiple psychiatric diagnosis. , 1972, Archives of general psychiatry.

[18]  D. Quade,et al.  A STUDY OF PSYCHIATRIC DIAGNOSIS , 1964, The Journal of nervous and mental disease.

[19]  Gary G. Koch,et al.  Some Further Remarks Concerning "A General Approach to the Estimation of Variance Components" , 1968 .

[20]  Nathan Mantel,et al.  INCOMPLETE CONTINGENCY TABLES , 1970 .

[21]  D. Quade,et al.  On the question of an infectious process in the origin of childhood leukemia. , 1970, Biometrics.

[22]  L. Garland,et al.  Observer error in the interpretation of chest films; an international investigation. , 1952, Lancet.

[23]  D F Klein,et al.  The reliability of a decision tree technique applied to psychiatric diagnosis. , 1972, Biometrics.

[24]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[25]  W E CHAMBERLAIN,et al.  Tuberculosis case finding; a comparison of the effectiveness of various roentgenographic and photofluorographic methods. , 1947, Journal of the American Medical Association.

[26]  N. Mantel,et al.  Site distribution of cancer deaths in husband-wife and sibling pairs. , 1961, Journal of the National Cancer Institute.

[27]  J. Fleiss Estimating the accuracy of dichotomous judgments , 1965, Psychometrika.

[28]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[29]  R S SCHILLING,et al.  Disagreement Between Observers in an Epidemiological Study of Respiratory Disease , 1955, British medical journal.

[30]  J. Fleiss Assessing the Accuracy of Multivariate Observations , 1966 .

[31]  B. M. Bennett Tests of Hypotheses Concerning Matched Samples , 1967 .

[32]  Fletcher Cm The clinical diagnosis of pulmonary emphysema; an experimental study. , 1952 .

[33]  Gerald J. Hahn,et al.  A Problem in the Statistical Comparison of Measuring Devices , 1970 .

[34]  L. A. Goodman,et al.  Measures of Association for Cross Classifications III: Approximate Sampling Theory , 1963 .

[35]  P. D. Oldham,et al.  Problem of Consistent Radiological Diagnosis in Coalminers' Pneumoconiosis , 1949, British journal of industrial medicine.

[36]  B. Everitt,et al.  Large sample standard errors of kappa and weighted kappa. , 1969 .

[37]  B. M. Bennett Note on X 2 Tests for Matched Samples , 1968 .

[38]  J. Bearman,et al.  A STUDY OF VARIABILITY IN TUBERCULIN TEST READING. , 2015, The American review of respiratory disease.

[39]  J. Mandel The Measuring Process , 1959 .

[40]  H. Hartley,et al.  Computing Maximum Likelihood Estimates for the Mixed A.O.V. Model Using the W Transformation , 1973 .

[41]  G. Koch A general approach to estimation of variance components , 1967 .

[42]  J. Fleiss Measuring agreement between two judges on the presence or absence of a trait. , 1975, Biometrics.

[43]  P. Armitage,et al.  The Measurement of Observer Disagreement in the Recording of Signs , 1966 .

[44]  P. Krishnaswami,et al.  Bias In Multinomial Classification , 1968 .

[45]  Leslie Kish,et al.  Studies of Interviewer Variance for Attitudinal Variables , 1962 .

[46]  R. L. Ebel,et al.  Estimation of the reliability of ratings , 1951 .

[47]  J. Fleiss,et al.  ESTIMATING ACCURACY OF JUDGMENT USING RECORDED INTERVIEWS. , 1965, Archives of general psychiatry.

[48]  J. Yerushalmy,et al.  An evaluation of the role of serial chest roentgenograms in estimating the progress of disease in patients with pulmonary tuberculosis. , 1951, American review of tuberculosis.

[49]  G G Koch,et al.  An analysis for compounded functions of categorical data. , 1973, Biometrics.

[50]  Joseph L. Fleiss,et al.  A NEW VIEW OF INTER‐OBSERVER AGREEMENT , 1963 .

[51]  R. Light Measures of response agreement for qualitative data: Some generalizations and alternatives. , 1971 .

[52]  L. A. Goodman The Analysis of Cross-Classified Data: Independence, Quasi-Independence, and Interactions in Contingency Tables with or without Missing Entries , 1968 .

[53]  J. Cole Multivariate analysis of variance using patterned covariance matrices , 1969 .

[54]  J. Overall Estimating Individual Rater Reliabilities from Analysis of Treatment Effects , 1968 .

[55]  Louis Guttman,et al.  The test-retest reliability of qualitative data , 1946, Psychometrika.

[56]  Frank E. Grubbs,et al.  On Estimating Precision of Measuring Instruments and Product Variability , 1948 .

[57]  Domenic V. Cicchetti A new measure of agreement between rank ordered variables. , 1972 .

[58]  P. D. Oldham,et al.  Observers' errors in taking medical histories. , 1951, Lancet.

[59]  Morris H. Hansen,et al.  THE ESTIMATION AND INTERPRETATION OF GROSS DIFFERENCES AND THE SIMPLE RESPONSE VARIANCE , 1965 .

[60]  Maurice G. Kendall,et al.  The advanced theory of statistics , 1945 .

[61]  P. Armitage,et al.  OBSERVER DISAGREEMENT IN PHYSICAL SIGNS OF THE RESPIRATORY SYSTEM. , 1965, Lancet.

[62]  H. Fairfield Smith,et al.  Estimating Precision of Measuring Instruments , 1950 .

[63]  T. A. Bancroft,et al.  Statistical Theory in Research , 1952, Agronomy Journal.

[64]  E. Rogot,et al.  A proposed index for measuring agreement in test-retest studies. , 1966, Journal of chronic diseases.

[65]  C. Fletcher The Problem of Observer Variation in Medical Diagnosis with Special Reference to Chest Diseases , 1964, Methods of Information in Medicine.

[66]  J. Yerushalmy,et al.  The Role of Dual Reading in Mass Radiography1,2 , 1950 .

[67]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[68]  K. Krippendorff Bivariate Agreement Coefficients for Reliability of Data , 1970 .

[69]  J. Yerushalmy Statistical problems in assessing methods of medical diagnosis, with special reference to X-ray techniques. , 1947, Public health reports.