The Statistical Evaluation of Categorical Measurements: “Simple Scales, but Treacherous Complexity Underneath”

ABSTRACT The statistical evaluation of measurements on categorical scales is hampered by hiatuses in insight and conceptualization. Categorical scales have a simple mathematical structure. The underlying empirical reality, however, that they aim to reflect usually has a very complex structure. This complexity induces intricate challenges for the statistical evaluation of the performance of categorical measurement systems. Most current techniques deal ineffectively with these challenges, relying on simplistic conditional independence assumptions and careless sampling strategies. Moreover, they typically evaluate measurement systems in terms of concepts not clearly related to a notion of measurement error. This article proposes an approach for modeling the behavior of categorical measurements based on characteristic curves. The approach is intended to facilitate the development of more effective techniques. It is applied in a case study that illustrates what the authors believe is a realistic degree of complexity.

[1]  Stefan H. Steiner,et al.  The Statistical Evaluation of a Binary Test Based on Combined Samples , 2016 .

[2]  G. Jasso Review of "International Encyclopedia of Statistical Sciences, edited by Samuel Kotz, Norman L. Johnson, and Campbell B. Read, New York, Wiley, 1982-1988" , 1989 .

[3]  J. Hunter The national system of scientific measurement. , 1980, Science.

[4]  Douglas C. Montgomery,et al.  GAUGE CAPABILITY AND DESIGNED EXPERIMENTS. PART I: BASIC METHODS , 1993 .

[5]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[6]  Jeroen de Mast,et al.  Modeling and Evaluating Repeatability and Reproducibility of Ordinal Classifications , 2010, Technometrics.

[7]  Jeroen De Mast,et al.  Measurement System Analysis for Binary Inspection: Continuous Versus Dichotomous Measurands , 2011 .

[8]  Art Noda,et al.  Kappa coefficients in medical research , 2002, Statistics in medicine.

[9]  A. Feinstein,et al.  High agreement but low kappa: I. The problems of two paradoxes. , 1990, Journal of clinical epidemiology.

[10]  Stefan H. Steiner,et al.  Assessing a Binary Measurement System , 2008 .

[11]  Jeroen de Mast,et al.  Measurement System Analysis for Binary Data , 2008, Technometrics.

[12]  Jeroen de Mast,et al.  Measurement system analysis for categorical measurements: Agreement and kappa-type indices , 2007 .

[13]  D. Bartholomew,et al.  Statistics and the theory of measurement - Discussion , 1996 .

[14]  S. Standard GUIDE TO THE EXPRESSION OF UNCERTAINTY IN MEASUREMENT , 2006 .

[15]  J Mastde,et al.  Measurement system analysis for categorical data: Agreement and kappa type indices , 2007 .

[16]  David J. Hand,et al.  Statistics and the theory of measurement , 1996 .

[17]  M. Kane Measurement theory. , 1980, NLN publications.

[18]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[19]  J. De Mast,et al.  Some common errors of experimental design, interpretation and inference in agreement studies , 2015, Statistical methods in medical research.

[20]  J. D. Mast Agreement and Kappa-Type Indices , 2007 .

[21]  Stefan H. Steiner,et al.  Assessing a Binary Measurement System with Varying Misclassification Rates Using a Latent Class Random Effects Model , 2012 .

[22]  Mir M. Ali,et al.  A class of bivariate distri-butions including the bivariate logistic , 1978 .

[23]  Anne Marsden,et al.  International Organization for Standardization , 2014 .

[24]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[25]  Arnold Zellner,et al.  Introduction to Measurement with Theory , 2009 .

[26]  C. Gatsonis,et al.  Designing studies to ensure that estimates of test accuracy are transferable , 2002, BMJ : British Medical Journal.

[27]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[28]  A. J. Conger Integration and generalization of kappas for multiple raters. , 1980 .

[29]  Stefan H. Steiner,et al.  Assessment of a Binary Measurement System in Current Use , 2010 .

[30]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[31]  Stephen B. Vardeman,et al.  Two-way random-effects analyses and gauge R&R studies , 1999 .

[32]  Russell A. Boyles,et al.  Gauge Capability for Pass—Fail Inspection , 2001, Technometrics.

[33]  Jeroen de Mast,et al.  Assessment of binary inspection with a hybrid measurand , 2012, Qual. Reliab. Eng. Int..

[34]  E. Iso,et al.  Measurement Uncertainty and Probability: Guide to the Expression of Uncertainty in Measurement , 1995 .