Fast Algorithms and Theory for High-Dimensional Bayesian Varying Coefficient Models

Nonparametric varying coefficient (NVC) models are widely used for modeling time-varying effects on responses that are measured repeatedly. In this paper, we introduce the nonparametric varying coefficient spike-and-slab lasso (NVC-SSL) for Bayesian estimation and variable selection in NVC models. The NVC-SSL simultaneously selects and estimates the functionals of the significant time-varying covariates, while also accounting for temporal correlations. Our model can be implemented using an efficient expectation-maximization (EM) algorithm, thus avoiding the computational intensiveness of Markov chain Monte Carlo (MCMC) in high dimensions. We also introduce a simple method to make our model robust to misspecification of the temporal correlation structure. In contrast to frequentist NVC models, hardly anything is known about the large-sample properties for Bayesian NVC models. In this paper, we take a step towards addressing this longstanding gap between methodology and theory by deriving posterior contraction rates for the NVC-SSL model under both correct specification and misspecification of the temporal correlation structure. Finally, we illustrate our methodology through simulation studies and data analysis. Our proposed method is implemented in the publicly available R package NVCSSL.

[1]  Henry Horng-Shing Lu,et al.  Statistical methods for identifying yeast cell cycle transcription factors. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[3]  David B. Dunson,et al.  Robust Bayesian Inference via Coarsening , 2015, Journal of the American Statistical Association.

[4]  Jian Huang,et al.  A Selective Review of Group Selection in High-Dimensional Models. , 2012, Statistical science : a review journal of the Institute of Mathematical Statistics.

[5]  Rida T. Farouki,et al.  The Bernstein polynomial basis: A centennial retrospective , 2012, Comput. Aided Geom. Des..

[6]  Chuan Wang,et al.  Forecasting urban household water demand with statistical and machine learning methods using large space-time data: A Comparative study , 2018, Environ. Model. Softw..

[7]  G. Casella,et al.  Penalized regression, standard errors, and Bayesian lassos , 2010 .

[8]  Peter Grünwald,et al.  The Safe Bayesian - Learning the Learning Rate via the Mixability Gap , 2012, ALT.

[9]  Michael Q. Zhang,et al.  Identifying cooperativity among transcription factors controlling the cell cycle in yeast. , 2003, Nucleic acids research.

[10]  Lan Xue,et al.  Variable Selection in High-dimensional Varying-coefficient Models with Global Optimality , 2012, J. Mach. Learn. Res..

[11]  Huaihou Chen,et al.  A Penalized Spline Approach to Functional Mixed Effects Model Analysis , 2011, Biometrics.

[12]  Runze Li,et al.  Quadratic Inference Functions for Varying‐Coefficient Models with Longitudinal Data , 2006, Biometrics.

[13]  Clifford M. Hurvich,et al.  Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion , 1998 .

[14]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[15]  Subhashis Ghosal,et al.  Supremum Norm Posterior Contraction and Credible Sets for Nonparametric Multivariate Regression , 2014, 1411.6716.

[16]  E. George,et al.  Fast Bayesian Factor Analysis via Automatic Rotations to Sparsity , 2016 .

[17]  D. Steel,et al.  Seasonal Adjustment of an Aggregate Series using Univariate and Multivariate Basic Structural Models , 2011 .

[18]  Jianqing Fan,et al.  Statistical Methods with Varying Coefficient Models. , 2008, Statistics and its interface.

[19]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[20]  Li Ping Yang,et al.  Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data , 1998 .

[21]  Nicola J. Rinaldi,et al.  Serial Regulation of Transcriptional Regulators in the Yeast Cell Cycle , 2001, Cell.

[22]  S. Ghosal,et al.  Sparse Bayesian Additive Nonparametric Regression with Application to Health Effects of Pesticides Mixtures , 2019, Statistica Sinica.

[23]  P. Diggle,et al.  Analysis of Longitudinal Data , 2003 .

[24]  A. V. D. Vaart,et al.  BAYESIAN LINEAR REGRESSION WITH SPARSE PRIORS , 2014, 1403.0735.

[25]  John A. Rice,et al.  FUNCTIONAL AND LONGITUDINAL DATA ANALYSIS: PERSPECTIVES ON SMOOTHING , 2004 .

[26]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[27]  Ethan X. Fang,et al.  TEST OF SIGNIFICANCE FOR HIGH-DIMENSIONAL LONGITUDINAL DATA. , 2020, Annals of statistics.

[28]  A. Bhattacharya,et al.  Bayesian fractional posteriors , 2016, The Annals of Statistics.

[29]  J. Ning,et al.  Regression analysis of longitudinal data with irregular and informative observation times. , 2015, Biostatistics.

[30]  Jianhua Z. Huang,et al.  Variable Selection in Nonparametric Varying-Coefficient Models for Analysis of Repeated Measurements , 2008, Journal of the American Statistical Association.

[31]  Wensheng Guo,et al.  Varying Coefficient Model with Unknown Within‐Subject Covariance for Analysis of Tumor Growth Curves , 2008, Biometrics.

[32]  C. Holmes,et al.  Assigning a value to a power likelihood in a general Bayesian model , 2017, 1701.08515.

[33]  Feng Liang,et al.  Bayesian Regularization for Graphical Models With Unequal Shrinkage , 2018, Journal of the American Statistical Association.

[34]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[35]  N. Hjort,et al.  On Bayesian consistency , 2001 .

[36]  E. George,et al.  The Spike-and-Slab LASSO , 2018 .

[37]  Yves F. Atchad'e On the contraction properties of some high-dimensional quasi-posterior distributions , 2015, 1508.07929.

[38]  M. Pourahmadi Graphical Diagnostics for Modeling Unstructured Covariance Matrices , 2002 .

[39]  Jianhua Z. Huang,et al.  Polynomial Spline Estimation and Inference for Varying Coefficient Models with Longitudinal Data , 2003 .

[40]  R. Tibshirani,et al.  Varying‐Coefficient Models , 1993 .

[41]  Runze Li,et al.  BAYESIAN GROUP LASSO FOR NONPARAMETRIC VARYING-COEFFICIENT MODELS WITH APPLICATION TO FUNCTIONAL GENOME-WIDE ASSOCIATION STUDIES. , 2015, The annals of applied statistics.

[42]  Chin-Tsang Chiang,et al.  KERNEL SMOOTHING ON VARYING COEFFICIENT MODELS WITH LONGITUDINAL DEPENDENT VARIABLE , 2000 .

[43]  K Y Liang,et al.  Longitudinal data analysis for discrete and continuous outcomes. , 1986, Biometrics.

[44]  G. Michailidis,et al.  Regularized estimation in sparse high-dimensional time series models , 2013, 1311.4175.

[45]  Aad van der Vaart,et al.  Fundamentals of Nonparametric Bayesian Inference , 2017 .

[46]  Lingrui Gan,et al.  Bayesian Joint Estimation of Multiple Graphical Models , 2019, NeurIPS.

[47]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[48]  Gemma E. Moran,et al.  Variance Prior Forms for High-Dimensional Bayesian Variable Selection , 2018, Bayesian Analysis.

[49]  C. O’Brien Statistical Learning with Sparsity: The Lasso and Generalizations , 2016 .

[50]  Veronika Rockova,et al.  EMVS: The EM Approach to Bayesian Variable Selection , 2014 .

[51]  Peter A. Calabresi,et al.  Spike and-Slab Group LASSOs for Grouped Regression and Sparse Generalized Additive Models , 2019 .

[52]  Yingcun Xia,et al.  Shrinkage Estimation of the Varying Coefficient Model , 2008 .

[53]  Ludwig Fahrmeir,et al.  Bayesian varying-coefficient models using adaptive regression splines , 2001 .

[54]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[55]  Jianqing Fan,et al.  Simultaneous Confidence Bands and Hypothesis Testing in Varying‐coefficient Models , 2000 .

[56]  Xiaofang Xu,et al.  Bayesian Variable Selection and Estimation for Group Lasso , 2015, 1512.01013.

[57]  Peter Colman,et al.  Analysis of Longitudinal Data(second edition) Diggle P, Heagarty P, Liang K-Y, Zeger S(2002)ISBN 0198524846; 396 pages;£40.00,$85.00 Oxford University Press; , 2004 .

[58]  Jianqing Fan,et al.  Efficient Estimation and Inferences for Varying-Coefficient Models , 2000 .

[59]  Martin J. Wainwright,et al.  Restricted Eigenvalue Properties for Correlated Gaussian Designs , 2010, J. Mach. Learn. Res..

[60]  S L Zeger,et al.  Regression analysis for correlated data. , 1993, Annual review of public health.

[61]  Sameer K. Deshpande,et al.  Simultaneous Variable and Covariance Selection With the Multivariate Spike-and-Slab LASSO , 2017, Journal of Computational and Graphical Statistics.

[62]  Jian Huang,et al.  VARIABLE SELECTION AND ESTIMATION IN HIGH-DIMENSIONAL VARYING-COEFFICIENT MODELS. , 2011, Statistica Sinica.

[63]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[64]  Jianqing Fan,et al.  Two‐step estimation of functional linear models with applications to longitudinal data , 1999 .

[65]  D. Nott,et al.  Bayesian estimation of varying-coefficient models with missing data, with application to the Singapore Longitudinal Aging Study , 2015 .

[66]  Xinyan Zhang,et al.  The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection , 2016, Genetics.

[67]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[68]  Birgit Claus Henn,et al.  Bayesian varying coefficient kernel machine regression to assess neurodevelopmental trajectories associated with exposure to complex mixtures , 2018, Statistics in medicine.

[69]  Ronald Christensen,et al.  Bayesian Ideas and Data Analysis: An Introduction for Scientists and Statisticians , 2010 .

[70]  S. Ghosal,et al.  Bayesian linear regression for multivariate responses under group sparsity , 2018, Bernoulli.

[71]  S. Ghosal,et al.  Unified Bayesian asymptotic theory for sparse linear regression , 2020, 2008.10230.