The effect of spatial aggregation on performance when mapping a risk of disease

BackgroundSpatial data on cases are available either in point form (e.g. longitude/latitude), or aggregated by an administrative region (e.g. zip code or census tract). Statistical methods for spatial data may accommodate either form of data, however the spatial aggregation can affect their performance. Previous work has studied the effect of spatial aggregation on cluster detection methods. Here we consider geographic health data at different levels of spatial resolution, to study the effect of spatial aggregation on disease mapping performance in locating subregions of increased disease risk.MethodsWe implemented a non-parametric disease distance-based mapping (DBM) method to produce a smooth map from spatially aggregated childhood leukaemia data. We then simulated spatial data under controlled conditions to study the effect of spatial aggregation on its performance. We used an evaluation method based on ROC curves to compare performance of DBM across different geographic scales.ResultsApplication of DBM to the leukaemia data illustrates the method as a useful visualization tool. Spatial aggregation produced expected degradation of disease mapping performance. Characteristics of this degradation, however, varied depending on the interaction between the geographic extent of the higher risk area and the level of aggregation. For example, higher risk areas dispersed across several units did not suffer as greatly from aggregation. The choice of centroids also had an impact on the resulting mapping.ConclusionsDBM can be implemented for continuous and discrete spatial data, but the resulting mapping can lose accuracy in the second setting. Investigation of the simulations suggests a complex relationship between performance loss, geographic extent of spatial disturbances and centroid locations. Aggregation of spatial data destroys information and thus impedes efforts to monitor these data for spatial disturbances. The effect of spatial aggregation on cluster detection, disease mapping, and other useful methods in spatial epidemiology is complex and deserves further study.

[1]  Kenneth D Mandl,et al.  Privacy protection versus cluster detection in spatial epidemiology. , 2006, American journal of public health.

[2]  P. Diggle,et al.  Overview of statistical methods for disease mapping and its relationship to cluster detection , 2001 .

[3]  J. Wakefield,et al.  Spatial epidemiology: methods and applications. , 2000 .

[4]  M. Kulldorff,et al.  Breast cancer clusters in the northeast United States: a geographic analysis. , 1997, American journal of epidemiology.

[5]  George Casella,et al.  Leukemia clusters in upstate New York: how adding covariates changes the story , 2001 .

[6]  Al Ozonoff,et al.  Research Paper: Power to Detect Spatial Disturbances under Different Levels of Geographic Aggregation , 2009, J. Am. Medical Informatics Assoc..

[7]  L. Waller,et al.  Applied Spatial Statistics for Public Health Data: Waller/Applied Spatial Statistics , 2004 .

[8]  Caroline Jeffery Disease mapping and statistical issues in public health surveillance , 2010 .

[9]  S. Scobie Spatial epidemiology: methods and applications , 2003 .

[10]  Y. LindaJ. Combining Incompatible Spatial Data , 2003 .

[11]  P. Diggle Applied Spatial Statistics for Public Health Data , 2005 .

[12]  David R. Brillinger,et al.  Examples of Scientific Problems and Data Analyses in Demography, Neurophysiology, and Seismology , 1994 .

[13]  Tom Koch,et al.  The Map as Intent: Variations on the Theme of John Snow , 2004, Cartogr. Int. J. Geogr. Inf. Geovisualization.

[14]  M Kulldorff,et al.  Spatial disease clusters: detection and inference. , 1995, Statistics in medicine.

[15]  M. Pagano,et al.  Distance-Based Mapping of Disease Risk , 2013, The international journal of biostatistics.

[16]  L A Waller Statistical power and design of focused clustering studies. , 1996, Statistics in medicine.

[17]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[18]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[19]  Ulrich Stadtmüller,et al.  Spatial Smoothing of Geographically Aggregated Data, with Application to the Construction of Incidence Maps , 1997 .

[20]  E. Lesaffre,et al.  Disease mapping and risk assessment for public health. , 1999 .

[21]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[22]  Andrew B. Lawson,et al.  Spatial and Syndromic Surveillance for Public Health: Lawson/Spatial and Syndromic Surveillance for Public Health , 2005 .

[23]  P. Diggle,et al.  Spatial variation in risk of disease: a nonparametric binary regression approach , 2002 .

[24]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[25]  Andrew B. Lawson,et al.  Spatial and syndromic surveillance for public health , 2005 .

[26]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[27]  A. Lawson Disease map reconstruction , 2001, Statistics in medicine.

[28]  J. Wakefield,et al.  Modeling Spatial Variation in Disease Risk , 2002 .

[29]  Jon Wakefield,et al.  Clustering, cluster detection, and spatial variation in risk , 2001 .

[30]  Jon Wakefield,et al.  Disease mapping and spatial regression with count data. , 2007, Biostatistics.

[31]  Marcello Pagano,et al.  Effect of spatial resolution on cluster detection: a simulation study , 2007, International journal of health geographics.

[32]  B. Brown Case studies in biometry , 1996 .

[33]  David R. Brillinger,et al.  Case studies in biometry , 1995 .

[34]  Andrew B. Lawson,et al.  Bayesian Disease Mapping: Hierarchical Modeling in Spatial Epidemiology , 2008 .

[35]  Martin Kulldorff,et al.  Influence of Spatial Resolution on Space-Time Disease Cluster Detection , 2012, PloS one.

[36]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[37]  Sylvia Richardson,et al.  A comparison of Bayesian spatial models for disease mapping , 2005, Statistical methods in medical research.

[38]  Marcello Pagano,et al.  The interpoint distance distribution as a descriptor of point patterns, with an application to spatial disease clustering , 2005, Statistics in medicine.

[39]  Linda J. Young,et al.  A Geostatistical Approach to Linking Geographically Aggregated Data From Different Sources , 2007 .

[40]  P. Diggle,et al.  Non-parametric estimation of spatial variation in relative risk. , 1995, Statistics in medicine.

[41]  W. Tobler Smooth pycnophylactic interpolation for geographical regions. , 1979, Journal of the American Statistical Association.