Spatial disease mapping using directed acyclic graph auto-regressive (DAGAR) models.

Hierarchical models for regionally aggregated disease incidence data commonly involve region specific latent random effects that are modelled jointly as having a multivariate Gaussian distribution. The covariance or precision matrix incorporates the spatial dependence between the regions. Common choices for the precision matrix include the widely used intrinsic conditional autoregressive model, which is singular, and its nonsingular extension which lacks interpretability. We propose a new parametric model for the precision matrix based on a directed acyclic graph representation of the spatial dependence. Our model guarantees positive definiteness and, hence, in addition to being a valid prior for regional spatially correlated random effects, can also directly model the outcome from dependent data like images and networks. Theoretical and empirical results demonstrate the interpretability of parameters in our model. Our precision matrix is sparse and the model is highly scalable for large datasets. We also derive a novel order-free version which remedies the dependence of directed acyclic graphs on the ordering of the regions by averaging over all possible orderings. The resulting precision matrix is available in closed form. We demonstrate the superior performance of our models over competing models using simulation experiments and a public health application.

[1]  N. Hamm,et al.  NONSEPARABLE DYNAMIC NEAREST NEIGHBOR GAUSSIAN PROCESS MODELS FOR LARGE SPATIO-TEMPORAL DATA WITH AN APPLICATION TO PARTICULATE MATTER ANALYSIS. , 2015, The annals of applied statistics.

[2]  Murali Haran,et al.  Dimension reduction and alleviation of confounding for spatial generalized linear mixed models , 2010, 1011.6649.

[3]  Zhiyi Chi,et al.  Approximating likelihoods for large spatial data sets , 2004 .

[4]  Analysis of the relationship between socioeconomic factors and stomach cancer incidence in Slovenia. , 2006, Neoplasma.

[5]  M. Pourahmadi,et al.  Nonparametric estimation of large covariance matrices of longitudinal data , 2003 .

[6]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[7]  N. Higham Analysis of the Cholesky Decomposition of a Semi-definite Matrix , 1990 .

[8]  H. Rue,et al.  Scaling intrinsic Gaussian Markov random field priors in spatial modelling , 2014 .

[9]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.

[10]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[11]  Michèle Basseville,et al.  Modeling and estimation of multiresolution stochastic processes , 1992, IEEE Trans. Inf. Theory.

[12]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[13]  Bradley P Carlin,et al.  Generalized Hierarchical Multivariate CAR Models for Areal Data , 2005, Biometrics.

[14]  C B Dean,et al.  Parametric bootstrap and penalized quasi-likelihood inference in conditional autoregressive models. , 2000, Statistics in medicine.

[15]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[16]  M. Wall A close look at the spatial structure implied by the CAR and SAR models , 2004 .

[17]  Harrison H. Zhou,et al.  Optimal rates of convergence for covariance matrix estimation , 2010, 1010.3866.

[18]  A. V. Vecchia Estimation and model identification for continuous spatial processes , 1988 .

[19]  Adam J. Rothman,et al.  Generalized Thresholding of Large Covariance Matrices , 2009 .

[20]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[21]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[22]  Norman E. Breslow,et al.  Estimation of Disease Rates in Small Areas: A new Mixed Model for Spatial Dependence , 2000 .

[23]  J. Besag,et al.  On conditional and intrinsic autoregressions , 1995 .

[24]  E. Krainski,et al.  Neighborhood Dependence in Bayesian Spatial Models , 2009, Biometrical journal. Biometrische Zeitschrift.

[25]  B. Carlin,et al.  Bayesian areal wombling via adjacency modeling , 2007, Environmental and Ecological Statistics.

[26]  Martin J. Wainwright,et al.  Embedded trees: estimation of Gaussian Processes on graphs with cycles , 2004, IEEE Transactions on Signal Processing.

[27]  P. Whittle ON STATIONARY PROCESSES IN THE PLANE , 1954 .

[28]  Sudipto Banerjee,et al.  Towards a Multidimensional Approach to Bayesian Disease Mapping. , 2017, Bayesian analysis.

[29]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[30]  Noureddine El Karoui,et al.  Operator norm consistent estimation of large-dimensional sparse covariance matrices , 2008, 0901.3220.

[31]  H. Zou,et al.  Positive-Definite ℓ1-Penalized Estimation of Large Covariance Matrices , 2012, 1208.5702.

[32]  Adrian E. Raftery,et al.  Bayesian Model Averaging: A Tutorial , 2016 .

[33]  Debashis Mondal,et al.  First-order intrinsic autoregressions and the de Wijs process , 2005 .

[34]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[35]  L. Bernardinelli,et al.  Bayesian methods for mapping disease risk , 1996 .

[36]  Andrew O. Finley,et al.  Efficient Algorithms for Bayesian Nearest Neighbor Gaussian Processes , 2017, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[37]  Andrew O. Finley,et al.  Applying Nearest Neighbor Gaussian Processes to Massive Spatial Data Sets: Forest Canopy Height Prediction Across Tanana Valley Alaska , 2017 .

[38]  J. Besag,et al.  Bayesian analysis of agricultural field experiments , 1999 .

[39]  M. Martínez-Beneito A general modelling framework for multivariate disease mapping , 2013 .

[40]  D. Mondal,et al.  An h‐likelihood method for spatial mixed linear models based on intrinsic auto‐regressions , 2015 .

[41]  V. Zadnik,et al.  Effects of Residual Smoothing on the Posterior of the Fixed Effects in Disease‐Mapping Models , 2006, Biometrics.

[42]  N. Cressie,et al.  Image analysis with partially ordered Markov models , 1998 .

[43]  A. Gelfand,et al.  Proper multivariate conditional autoregressive models for spatial data analysis. , 2003, Biostatistics.

[44]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[45]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.