Interval Estimation for a Difference Between Intraclass Kappa Statistics

Model-based inference procedures for the kappa statistic have developed rapidly over the last decade. However, no method has yet been developed for constructing a confidence interval about a difference between independent kappa statistics that is valid in samples of small to moderate size. In this article, we propose and evaluate two such methods based on an idea proposed by Newcombe (1998, Statistics in Medicine, 17, 873-890) for constructing a confidence interval for a difference between independent proportions. The methods are shown to provide very satisfactory results in sample sizes as small as 25 subjects per group. Sample size requirements that achieve a prespecified expected width for a confidence interval about a difference of kappa statistic are also presented.

[1]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[2]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[3]  A. Donner,et al.  Statistical Inferences For Interobserver Agreement Studies With Nominal Outcome Data , 2001 .

[4]  Sadanori Konishi,et al.  Normalizing and variance stabilizing transformations for intraclass correlations , 1985 .

[5]  S. Greenland,et al.  A Comparison of the Performance of Model‐Based Confidence Intervals When the Correct Model Form Is Unknown: Coverage of Asymptotic Means , 1994, Epidemiology.

[6]  T. Hutchinson,et al.  Interobserver agreement by auscultation in the presence of a third heart sound in patients with congestive heart failure. , 1987, Chest.

[7]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[8]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[9]  Thomas J. Santner,et al.  Teaching Large‐Sample Binomial Confidence Intervals , 1998 .

[10]  L E Daly,et al.  Confidence limits made easy: interval estimation using a substitution method. , 1998, American journal of epidemiology.

[11]  K. Lui Confidence limits for the population prevalence rate based on the negative binomial distribution. , 1995, Statistics in medicine.

[12]  J. Koval,et al.  Interval estimation for Cohen's kappa as a measure of agreement. , 2000, Statistics in medicine.

[13]  W. G. Howe Approximate Confidence Limits on the Mean of X + Y Where X and Y Are Two Tabled Independent Random Variables , 1974 .

[14]  W. Stahel,et al.  Log-normal Distributions across the Sciences: Keys and Clues , 2001 .

[15]  C. A. Smith,et al.  ON THE ESTIMATION OF INTRACLASS CORRELATION , 1957 .

[16]  J. Bartko The Intraclass Correlation Coefficient as a Measure of Reliability , 1966, Psychological reports.

[17]  Calyampudi R. Rao,et al.  Comparison of LR, Score, and Wald Tests in a Non-IID Setting , 1997 .

[18]  S. Thompson,et al.  How should cost data in pragmatic randomised trials be analysed? , 2000, BMJ : British Medical Journal.

[19]  Allan Donner,et al.  Testing the equality of dependent intraclass correlation coefficients , 2002 .

[20]  J. Mclaughlin,et al.  Reliability of surrogate information on cigarette smoking by type of informant. , 1987, American journal of epidemiology.

[21]  M. Bartlett,et al.  APPROXIMATE CONFIDENCE INTERVALS , 1953 .

[22]  Lisa M. Schwartz,et al.  Misunderstandings about the effects of race and sex on physicians' referrals for cardiac catheterization. , 1999, The New England journal of medicine.

[23]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[24]  R. Elston On the correlation between correlations , 1975 .

[25]  R G Newcombe,et al.  Estimating the difference between differences: measurement of additive scale interaction for proportions , 2001, Statistics in medicine.

[26]  C. Kowalski On the Effects of Non‐Normality on the Distribution of the Sample Product‐Moment Correlation Coefficient , 1972 .

[27]  John M Lachin,et al.  The role of measurement reliability in clinical trials , 2004, Clinical trials.

[28]  B. Efron,et al.  Second thoughts on the bootstrap , 2003 .

[29]  J. Fleiss,et al.  Interval estimation under two study designs for kappa with binary classifications. , 1993, Biometrics.

[30]  A. Donner,et al.  A comparison of confidence interval methods for the intraclass correlation coefficient. , 1986, Biometrics.

[31]  P. Shrout Measurement reliability and agreement in psychiatry , 1998, Statistical methods in medical research.

[32]  J. R. Landis,et al.  A one-way components of variance model for categorical data , 1977 .

[33]  R. Newcombe Two-sided confidence intervals for the single proportion: comparison of seven methods. , 1998, Statistics in medicine.

[34]  Robert A. Hultquist,et al.  Interval Estimation for the Unbalanced Case of the One-Way Random Effects Model , 1978 .

[35]  L. S. Feldt,et al.  Testing the Equality of Two Related Intraclass Reliability Coefficients , 1994 .

[36]  N. Klar,et al.  Inference Procedures for Assessing Interobserver Agreement among Multiple Raters , 2001, Biometrics.

[37]  J J Gart,et al.  Approximate interval estimation of the difference in binomial parameters: correction for skewness and extension to multiple tables. , 1990, Biometrics.

[38]  L. S. Feldt,et al.  Test of the Hypothesis That the Intraclass Reliability Coefficient is the Same for Two Measurement Procedures , 1992 .

[39]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[40]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[41]  K. Rothman Epidemiology: An Introduction , 2002 .

[42]  H. Kraemer,et al.  2 x 2 kappa coefficients: measures of agreement or association. , 1989, Biometrics.

[43]  B. Efron,et al.  Bootstrap confidence intervals , 1996 .

[44]  Peter Hall,et al.  On the Removal of Skewness by Transformation , 1992 .

[45]  M. Banerjee,et al.  Bayesian Inference for Kappa from Single and Multiple Studies , 2000, Biometrics.

[46]  M Defontaine,et al.  In vivo performance of a matrix-based quantitative ultrasound imaging device dedicated to calcaneus investigation. , 2002, Ultrasound in medicine & biology.

[47]  Duncan Cramer,et al.  Measurement reliability and agreement , 1998 .

[48]  A Donner,et al.  A goodness-of-fit approach to inference procedures for the kappa statistic: confidence interval construction, significance-testing and sample size estimation. , 1992, Statistics in medicine.

[49]  S. Greenland,et al.  Estimating causal effects. , 2002, International journal of epidemiology.

[50]  B O'Brien,et al.  Statistical analysis of cost effectiveness data. , 1999, The Journal of rheumatology.

[51]  F. Graybill,et al.  Confidence Intervals on Variance Components. , 1993 .

[52]  A. Donner A Review of Inference Procedures for the Intraclass Correlation Coefficient in the One-Way Random Effects Model , 1986 .

[53]  Z. Tu,et al.  A Better Confidence Interval for Kappa (κ) on Measuring Agreement between Two Raters with Binary Outcomes@@@A Better Confidence Interval for Kappa (k) on Measuring Agreement between Two Raters with Binary Outcomes , 1994 .

[54]  J. Zhang,et al.  What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. , 1998, JAMA.

[55]  James F. Reed,et al.  Homogeneity of kappa statistics in multiple samples , 2000, Comput. Methods Programs Biomed..

[56]  J. Aitchison,et al.  The Lognormal Distribution. , 1958 .

[57]  W W Hauck,et al.  A consequence of omitted covariates when estimating odds ratios. , 1991, Journal of clinical epidemiology.

[58]  F. J. Evans,et al.  New Data from the Addiction Severity Index Reliability and Validity in Three Centers , 1985, The Journal of nervous and mental disease.

[59]  W. G. Cochran Errors of Measurement in Statistics , 1968 .

[60]  S D Walter,et al.  A reappraisal of the kappa coefficient. , 1988, Journal of clinical epidemiology.

[61]  A. Agresti,et al.  Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions , 1998 .

[62]  Douglas G. Bonett,et al.  An improved confidence interval for a linear function of binomial proportions , 2004, Comput. Stat. Data Anal..

[63]  J. Nam Interval Estimation of the Kappa Coefficient with Binary Classification and an Equal Marginal Probability Model , 2000, Biometrics.

[64]  T. Mak Analysing Intraclass Correlation for Dichotomous Variables , 1988 .

[65]  Keith E Muller,et al.  Improved approximate confidence intervals for the mean of a log‐normal random variable , 2002, Statistics in medicine.

[66]  M Nurminen,et al.  Comparative analysis of two rates. , 1985, Statistics in medicine.

[67]  W. Barlow Measurement of interrater agreement with adjustment for covariates. , 1996, Biometrics.

[68]  Jiun-Kae Jack Lee,et al.  A Better Confidence Interval for Kappa (κ) on Measuring Agreement between Two Raters with Binary Outcomes , 1994 .

[69]  M. Bartlett,et al.  APPROXIMATE CONFIDENCE INTERVALSMORE THAN ONE UNKNOWN PARAMETER , 1953 .

[70]  Bradley Efron,et al.  Bootstrap Condence Intervals , 1996 .

[71]  R. Newcombe,et al.  Interval estimation for the difference between independent proportions: comparison of eleven methods. , 1998, Statistics in medicine.

[72]  E. Faerstein,et al.  Reliability of the information about the history of diagnosis and treatment of hypertension. Differences in regard to sex, age, and educational level. The Pró-Saúde study. , 2001, Arquivos brasileiros de cardiologia.

[73]  Stephen J. Smith Evaluating the efficiency of the δ-distribution mean estimator , 1988 .

[74]  A Donner,et al.  Testing the equality of two dependent kappa statistics. , 2000, Statistics in medicine.

[75]  Xiao-Hua Zhou,et al.  Inferences about population means of health care costs , 2002, Statistical methods in medical research.

[76]  Neil Klar,et al.  An Estimating Equations Approach for Modelling Kappa , 2000 .

[77]  M. Eliasziw,et al.  Testing the homogeneity of kappa statistics. , 1996, Biometrics.

[78]  B. Toone,et al.  Computerized tomographic scan changes in early schizophrenia – preliminary findings , 1986, Psychological Medicine.

[79]  P. W. Lane,et al.  Analysis of covariance and standardization as instances of prediction. , 1982, Biometrics.

[80]  W. Barlow,et al.  A comparison of methods for calculating a stratified kappa. , 1990, Statistics in medicine.

[81]  K. McGraw,et al.  Forming inferences about some intraclass correlation coefficients. , 1996 .

[82]  Nathaniel Schenker,et al.  Qualms about Bootstrap Confidence Intervals , 1985 .

[83]  L. Kurland,et al.  Studies on multiple sclerosis in Winnepeg, Manitoba, and New Orleans, Louisiana. I. Prevalence; comparison between the patient groups in Winnipeg and New Orleans. , 1953, American journal of hygiene.

[84]  Rebecca Zwick,et al.  Another look at interrater agreement. , 1988, Psychological bulletin.

[85]  Dennis E. Jennings How Do We Judge Confidence-Interval Adequacy? , 1987 .

[86]  S Greenland,et al.  Standardized estimates from categorical regression models. , 1995, Statistics in medicine.

[87]  Mikhail Nikulin,et al.  Statistical planning and inference in accelerated life testing using the CHSS model , 2004 .