Spatial regression and estimation of disease risks: A clustering-based approach

Detection of clustering and estimation of incidence risks are important and useful in public health and epidemiological research. The popular spatial regression models for disease risks, such as conditional autoregressive (CAR) models, assume a known spatial dependence structure for the error distribution and a set of common regression parameters for the mean structure. While it is often difficult to justify the structural assumption on spatial dependence, the assumption on a common regression surface may not be practical for a large spatial domain. We conceptualize a study region as a union of spatially connected clusters where a cluster is composed of geographically adjacent regions. We propose a regression model with cluster-wise varying regression parameters. Our model is able to capture a spatial clustering structure, while the corresponding cluster-wise regression parameters are estimated given the estimated clustering configuration. The proposed model is flexible in terms of regional and global shrinking as well as the number of clusters, cluster memberships and cluster locations. We develop an algorithm based on the reversible jump Markov chain Monte Carlo (MCMC) method for model estimation. The numerical study shows effectiveness of the proposed methodology. The method is computationally efficient and thus amenable to large datasets. © 2016 Wiley Periodicals, Inc. Statistical Analysis and Data Mining: The ASA Data Science Journal, 2016

[1]  Subhash R. Lele,et al.  A Regression Method for Spatial Disease Rates: An Estimating Function Approach , 1997 .

[2]  G. Casella,et al.  Clustering using objective functions and stochastic search , 2008 .

[3]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[4]  L Knorr-Held,et al.  Bayesian Detection of Clusters and Discontinuities in Disease Maps , 2000, Biometrics.

[5]  J. Hodges,et al.  Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love , 2010 .

[6]  Bradley P Carlin,et al.  Generalized Hierarchical Multivariate CAR Models for Areal Data , 2005, Biometrics.

[7]  Santosh S. Vempala,et al.  On clusterings: Good, bad and spectral , 2004, JACM.

[8]  D. Clayton,et al.  Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. , 1987, Biometrics.

[9]  K. Haustein Smoking and poverty , 2006, European journal of cardiovascular prevention and rehabilitation : official journal of the European Society of Cardiology, Working Groups on Epidemiology & Prevention and Cardiac Rehabilitation and Exercise Physiology.

[10]  María Durbán,et al.  Smooth-CAR mixed models for spatial count data , 2008, Comput. Stat. Data Anal..

[11]  V. Zadnik,et al.  Effects of Residual Smoothing on the Posterior of the Fixed Effects in Disease‐Mapping Models , 2006, Biometrics.

[12]  C. F. Sirmans,et al.  Spatial Modeling With Spatially Varying Coefficient Processes , 2003 .

[13]  Ronald E Gangnon,et al.  A hierarchical model for spatially clustered disease rates. , 2003, Statistics in medicine.

[14]  Jon Wakefield,et al.  Disease mapping and spatial regression with count data. , 2007, Biostatistics.

[15]  D G Denison,et al.  Bayesian Partitioning for Estimating Disease Risk , 2001, Biometrics.

[16]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  M. Vannucci,et al.  Bayesian Variable Selection in Clustering High-Dimensional Data , 2005 .

[18]  Albert Kim,et al.  A Bayesian model for cluster detection. , 2013, Biostatistics.

[19]  P. Moran Notes on continuous stochastic phenomena. , 1950, Biometrika.