Bayesian Variable Selection in Double Generalized Linear Tweedie Spatial Process Models

Double generalized linear models provide a flexible framework for modeling data by allowing the mean and the dispersion to vary across observations. Common members of the exponential dispersion family including the Gaussian, Poisson, compound Poisson-gamma (CP-g), Gamma and inverse-Gaussian are known to admit such models. The lack of their use can be attributed to ambiguities that exist in model specification under a large number of covariates and complications that arise when data display complex spatial dependence. In this work we consider a hierarchical specification for the CP-g model with a spatial random effect. The spatial effect is targeted at performing uncertainty quantification by modeling dependence within the data arising from location based indexing of the response. We focus on a Gaussian process specification for the spatial effect. Simultaneously, we tackle the problem of model specification for such models using Bayesian variable selection. It is effected through a continuous spike and slab prior on the model parameters, specifically the fixed effects. The novelty of our contribution lies in the Bayesian frameworks developed for such models. We perform various synthetic experiments to showcase the accuracy of our frameworks. They are then applied to analyze automobile insurance premiums in Connecticut, for the year of 2008.

[1]  Sebastian Kurtek,et al.  RADIOHEAD: Radiogenomic analysis incorporating tumor heterogeneity in imaging through densities , 2021, The Annals of Applied Statistics.

[2]  S. Chowdhury,et al.  Differential expression of single-cell RNA-seq data using Tweedie models , 2021, bioRxiv.

[3]  D. Dey,et al.  Comparisons of zero‐augmented continuous regression models from a Bayesian perspective , 2020, Statistics in medicine.

[4]  D. Dey,et al.  Spatial Tweedie exponential dispersion models: an application to insurance rate-making , 2020, 2003.06299.

[5]  C. C. Kokonendji,et al.  Tweedie regression models and its geometric sums for (semi‐)continuous data , 2020, WIREs Computational Statistics.

[6]  Dorit Hammerling,et al.  A Case Study Competition Among Methods for Analyzing Large Spatial Data , 2017, Journal of Agricultural, Biological and Environmental Statistics.

[7]  Sudipto Banerjee,et al.  Bayesian Modeling and Analysis of Geostatistical Data. , 2017, Annual review of statistics and its application.

[8]  C. Wikle,et al.  Bayesian Hierarchical Models With Conjugate Full-Conditional Distributions for Dependent Data From the Natural Exponential Family , 2017, 1701.07506.

[9]  Hui Zou,et al.  Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models , 2015, 1508.06378.

[10]  S. T. Buckland,et al.  Bayesian hierarchical modelling of continuous non‐negative longitudinal data with a spike at zero: An application to a study of birds visiting gardens in winter , 2015, Biometrical journal. Biometrische Zeitschrift.

[11]  Yanwei Zhang,et al.  Likelihood-based and Bayesian methods for Tweedie compound Poisson linear mixed models , 2013, Stat. Comput..

[12]  Andrew O. Finley,et al.  Norges Teknisk-naturvitenskapelige Universitet Approximate Bayesian Inference for Large Spatial Datasets Using Predictive Process Models Approximate Bayesian Inference for Large Spatial Datasets Using Predictive Process Models , 2022 .

[13]  Jianqing Fan,et al.  Endogeneity in High Dimensions. , 2012, Annals of statistics.

[14]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[15]  H. Rue,et al.  Approximate Bayesian Inference for Survival Models , 2010 .

[16]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[17]  Qing Li,et al.  The Bayesian elastic net , 2010 .

[18]  Robert E Weiss,et al.  Bayesian methods for data analysis. , 2010, American journal of ophthalmology.

[19]  Sudipto Banerjee,et al.  HIERARCHICAL SPATIAL MODELS FOR PREDICTING TREE SPECIES ASSEMBLAGES ACROSS LARGE DOMAINS. , 2009, The annals of applied statistics.

[20]  Hiroshi Shono,et al.  Scientific Committee Sixth Regular Session Application of the Tweedie Distribution to Zero-catch Data in Cpue Analysis Application of the Tweedie Distribution to Zero-catch Data in Cpue Analysis , 2022 .

[21]  Andrew B. Lawson,et al.  Bayesian Disease Mapping: Hierarchical Modeling in Spatial Epidemiology , 2008 .

[22]  Jeffrey S. Morris,et al.  Bayesian Analysis of Mass Spectrometry Proteomic Data Using Wavelet‐Based Functional Mixed Models , 2008, Biometrics.

[23]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[24]  Gordon K. Smyth,et al.  Evaluation of Tweedie exponential dispersion model densities by Fourier inversion , 2008, Stat. Comput..

[25]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[26]  P. Diggle,et al.  Model‐based geostatistics , 2007 .

[27]  J. Nelder,et al.  Double hierarchical generalized linear models (with discussion) , 2006 .

[28]  Gordon K. Smyth,et al.  Series evaluation of Tweedie exponential dispersion model densities , 2005, Stat. Comput..

[29]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[30]  A. Gelfand,et al.  Modelling species diversity through species level hierarchical modelling , 2005 .

[31]  Bradley P Carlin,et al.  Parametric Spatial Cure Rate Models for Interval‐Censored Time‐to‐Relapse Data , 2004, Biometrics.

[32]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[33]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[34]  Alan E. Gelfand,et al.  Zero-inflated models with application to spatial count data , 2002, Environmental and Ecological Statistics.

[35]  G. Roberts,et al.  Langevin Diffusions and Metropolis-Hastings Algorithms , 2002 .

[36]  Bent Jørgensen,et al.  Fitting Tweedie's Compound Poisson Model to Insurance Claims Data: Dispersion Modelling , 2002, ASTIN Bulletin.

[37]  Hao Zhang On Estimation and Prediction for Spatial Generalized Linear Mixed Models , 2002, Biometrics.

[38]  J. Berger,et al.  Objective Bayesian Analysis of Spatially Correlated Data , 2001 .

[39]  N. G. Best,et al.  Spatial Poisson Regression for Health and Exposure Data Measured at Disparate Resolutions , 2000 .

[40]  M. Berliner Hierarchical Bayesian modeling in the environmental sciences , 2000 .

[41]  B. Mallick,et al.  Generalized Linear Models : A Bayesian Perspective , 2000 .

[42]  Gordon K. Smyth,et al.  Adjusted likelihood methods for modelling dispersion in generalized linear models , 1999 .

[43]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[44]  R. Wolpert,et al.  Poisson/gamma random field models for spatial statistics , 1998 .

[45]  Ananda Sen,et al.  The Theory of Dispersion Models , 1997, Technometrics.

[46]  D. Madigan,et al.  Bayesian Model Averaging for Linear Regression Models , 1997 .

[47]  A. Gelfand,et al.  Efficient parametrisations for normal linear mixed models , 1995 .

[48]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[49]  Bent Jørgensen,et al.  Exponential Dispersion Models and Extensions: A Review , 1992 .

[50]  Scott L. Zeger,et al.  Generalized linear models with random e ects: a Gibbs sampling approach , 1991 .

[51]  Gordon K. Smyth,et al.  Generalized linear models with varying dispersion , 1989 .

[52]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[53]  J. Nelder,et al.  An extended quasi-likelihood function , 1987 .

[54]  David M. Miller,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[55]  F. Lutscher Spatial Variation , 2019, Interdisciplinary Applied Mathematics.

[56]  L. Vogt Statistics For Spatial Data , 2016 .

[57]  Timothy Hanson,et al.  Bayesian Spatial Survival Models , 2015 .

[58]  James O. Berger,et al.  Objective Bayesian Methods for Model Selection: Introduction and Comparison , 2001 .

[59]  A. Gelfand,et al.  Efficient parametrizations for generalized linear mixed models, (with discussion). , 1996 .

[60]  A. Verbyla,et al.  Modelling Variance Heterogeneity: Residual Maximum Likelihood and Diagnostics , 1993 .

[61]  B. Jørgensen Exponential Dispersion Models , 1987 .

[62]  B. Jørgensen,et al.  Some properties of exponential dispersion models , 1986 .

[63]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .