论文信息 - Diversity of decision-making models and the measurement of interrater agreement.

Diversity of decision-making models and the measurement of interrater agreement.

Several papers have appeared criticizing the kappa coefficient because of its tendency to fluctuate with sample base rates. The importance of these criticisms is difficult to evaluate because they are presented with regards to a highly specific model of diagnostic decision making. In this article, diagnostic decision making is viewed as a special case of signal detection theory. Each diagnostic process is characterized by a function that relates the probability of a case receiving a positive diagnosis to the severity or salience of symptoms. The shape of this diagnosability curve greatly affects the value of kappa obtained in a study of interrater reliability, how it changes in response to variation in the base rates, and how closely it corresponds to the validity of diagnostic decisions. The common practice of evaluating a diagnostic procedure, when criterion diagnoses for comparison are unavailable, on the basis of the magnitude of the kappa coefficient observed in a reliability study is questionable. New methods for measuring interrater agreement are necessary, and possible directions for research in this area are discussed.

J. Uebersax

[1] Lee B. Lusted,et al. Introduction to medical decision making , 1968 .

[2] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .

[3] A E Maxwell,et al. Coefficients of Agreement Between Observers and Their Interpretation , 1977, British Journal of Psychiatry.

[4] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.

[5] C. Metz. Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[6] C L Janes,et al. An Extension of the Random Error Coefficient of Agreement to N x N Tables , 1979, British Journal of Psychiatry.

[7] K. Kaye,et al. Estimating False Alarms and Missed Events From Interobserver Agreement: A Rationale , 1980 .

[8] H. Kraemer,et al. Extension of the kappa coefficient. , 1980, Biometrics.

[9] N. Andreasen,et al. Reliability studies of psychiatric diagnosis. Theory and practice. , 1981, Archives of general psychiatry.

[10] E. Spitznagel,et al. A proposed solution to the base rate problem in the kappa statistic. , 1985, Archives of general psychiatry.

[11] Martin A. Tanner,et al. Modeling Agreement among Raters , 1985 .

[12] H. Schouten,et al. Statistical measurement of interobserver agreement [: Analysis of agreements and disagreements between observers] , 1985 .

[13] J. Swets. Indices of discrimination or diagnostic accuracy: their ROCs and implied models. , 1986, Psychological bulletin.