Understanding the effects of dichotomization of continuous outcomes on geostatistical inference

Abstract Diagnosis is often based on the exceedance or not of continuous health indicators of a predefined cut-off value, so as to classify patients into positives and negatives for the disease under investigation. In this paper, we investigate the effects of dichotomization of spatially-referenced continuous outcome variables on geostatistical inference. Although this issue has been extensively studied in other fields, dichotomization is still a common practice in epidemiological studies. Furthermore, the effects of this practice in the context of prevalence mapping have not been fully understood. Here, we demonstrate how spatial correlation affects the loss of information due to dichotomization, how linear geostatistical models can be used to map disease prevalence and thus avoid dichotomization, and finally, how dichotomization affects our predictive inference on prevalence. To pursue these objectives, we develop a metric, based on the composite likelihood, which can be used to quantify the potential loss of information after dichotomization without requiring the fitting of Binomial geostatistical models. Through a simulation study and two applications on disease mapping in Africa, we show that, as thresholds used for dichotomization move further away from the mean of the underlying process, the performance of binomial geostatistical models deteriorates substantially. We also find that dichotomization can lead to the loss of fine scale features of disease prevalence and increased uncertainty in the parameter estimates, especially in the presence of a large noise to signal ratio. These findings strongly support the conclusions from previous studies that dichotomization should be always avoided whenever feasible.

[1]  Ricardo J. Soares Magalhães,et al.  Mapping the Risk of Anaemia in Preschool-Age Children: The Contribution of Malnutrition, Malaria, and Helminth Infections in West Africa , 2011, PLoS medicine.

[2]  Douglas G Altman,et al.  Dichotomizing continuous predictors in multiple regression: a bad idea , 2006, Statistics in medicine.

[3]  Iveth J González,et al.  Global survey of malaria rapid diagnostic test (RDT) sales, procurement and lot verification practices: assessing the use of the WHO–FIND Malaria RDT Evaluation Programme (2011–2014) , 2017, Malaria Journal.

[4]  H. Pan,et al.  WHO child growth standards: length/height-for-age, weight-for-age, weight-for-length, weight-for-height and body mass index-for-age , 2006 .

[5]  Peter J. Diggle,et al.  PrevMap:An R Package for Prevalence Mapping , 2017 .

[6]  Patrick Royston,et al.  The cost of dichotomising continuous variables , 2006, BMJ : British Medical Journal.

[7]  Valerii Fedorov,et al.  Consequences of dichotomization , 2009, Pharmaceutical statistics.

[8]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[9]  A. Vickers,et al.  Against quantiles: categorization of continuous variables in epidemiologic research, and its discontents , 2012, BMC Medical Research Methodology.

[10]  P. Diggle,et al.  Model‐based geostatistics , 2007 .

[11]  M. Helinski,et al.  Monitoring changes in malaria epidemiology and effectiveness of interventions in Ethiopia and Uganda: Beyond Garki Project baseline survey , 2015, Malaria Journal.

[12]  Peter J. Diggle,et al.  Model-Based Geostatistics for Prevalence Mapping in Low-Resource Settings , 2015, 1505.06891.

[13]  GIUSEPPE DEL PRIORE,et al.  Treatment of Continuous Data as Categoric Variables in Obstetrics and Gynecology , 1997, Obstetrics and gynecology.

[14]  R. Atun,et al.  Adoption of Rapid Diagnostic Tests for the Diagnosis of Malaria, a Preliminary Analysis of the Global Fund Program Data, 2005 to 2010 , 2012, PloS one.

[15]  David L. Smith,et al.  Mapping child growth failure in Africa between 2000 and 2015 , 2018, Nature.

[16]  Mercedes Onis,et al.  WHO Child Growth Standards based on length/height, weight and age , 2006, Acta paediatrica (Oslo, Norway : 1992). Supplement.

[17]  D. H. Lees,et al.  Epidemiology for the Uninitiated , 1980 .

[18]  Kristopher J Preacher,et al.  On the practice of dichotomization of quantitative variables. , 2002, Psychological methods.

[19]  C. Garbe,et al.  Problems in defining cutoff points of continuous prognostic factors: example of tumor thickness in primary cutaneous melanoma. , 1997, Journal of clinical epidemiology.