Consequences of dichotomization

Dichotomization is the transformation of a continuous outcome (response) to a binary outcome. This approach, while somewhat common, is harmful from the viewpoint of statistical estimation and hypothesis testing. We show that this leads to loss of information, which can be large. For normally distributed data, this loss in terms of Fisher's information is at least 1-2/pi (or 36%). In other words, 100 continuous observations are statistically equivalent to 158 dichotomized observations. The amount of information lost depends greatly on the prior choice of cut points, with the optimal cut point depending upon the unknown parameters. The loss of information leads to loss of power or conversely a sample size increase to maintain power. Only in certain cases, for instance, in estimating a value of the cumulative distribution function and when the assumed model is very different from the true model, can the use of dichotomized outcomes be considered a reasonable approach.

[1]  P. Holgate,et al.  Estimation from Grouped and Partially Grouped Samples , 1964 .

[2]  John A. Lewis In defence of the dichotomy , 2004 .

[3]  Daniel F. Heitjan,et al.  Inference from Grouped Continuous Data: A Review , 1989 .

[4]  D G Altman,et al.  Statistics in medical journals: some recent trends. , 2000, Statistics in medicine.

[5]  D. Ragland,et al.  Dichotomizing Continuous Outcome Variables: Dependence of the Magnitude of Association and Statistical Power on the Cutpoint , 1992, Epidemiology.

[6]  G. Trenkler Continuous univariate distributions , 1994 .

[7]  Kristopher J Preacher,et al.  On the practice of dichotomization of quantitative variables. , 2002, Psychological methods.

[8]  M. Christian,et al.  [New guidelines to evaluate the response to treatment in solid tumors]. , 2000, Bulletin du cancer.

[9]  A. Perruchoud,et al.  Multimarker strategy for risk prediction in patients presenting with acute dyspnea to the emergency department. , 2008, International journal of cardiology.

[10]  G Molenberghs,et al.  Multivariate probit analysis: a neglected procedure in medical statistics. , 1991, Statistics in medicine.

[11]  Karl Pearson,et al.  ON THE SYSTEMATIC FITTING OF CURVES TO OBSERVATIONS AND MEASUREMENTS , 1902 .

[12]  W. Hays Statistical theory. , 1968, Annual review of psychology.

[13]  S. Senn An unreasonable prejudice against modelling? , 2005 .

[14]  P. Macdonald,et al.  Regression Estimation from Grouped Observations , 1974 .

[15]  S. Senn Disappointing dichotomies , 2003 .

[16]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[17]  D. Cox Note on Grouping , 1957 .

[18]  V. Fedorov,et al.  Generalized Probit Model in Design of Dose Finding Experiments , 2007 .

[19]  D. Tripathy,et al.  The use of a responder analysis to identify differences in patient outcomes following a self-care intervention to improve cancer pain management , 2007, Pain.

[20]  David P. Farrington,et al.  Some benefits of dichotomization in psychiatric and criminological research , 2000 .