Bayesian Function-on-Scalars Regression for High-Dimensional Data

Abstract We develop a fully Bayesian framework for function-on-scalars regression with many predictors. The functional data response is modeled nonparametrically using unknown basis functions, which produces a flexible and data-adaptive functional basis. We incorporate shrinkage priors that effectively remove unimportant scalar covariates from the model and reduce sensitivity to the number of (unknown) basis functions. For variable selection in functional regression, we propose a decision theoretical posterior summarization technique, which identifies a subset of covariates that retains nearly the predictive accuracy of the full model. Our approach is broadly applicable for Bayesian functional regression models, and unlike existing methods provides joint rather than marginal selection of important predictor variables. Computationally scalable posterior inference is achieved using a Gibbs sampler with linear time complexity in the number of predictors. The resulting algorithm is empirically faster than existing frequentist and Bayesian techniques, and provides joint estimation of model parameters, prediction and imputation of functional trajectories, and uncertainty quantification via the posterior distribution. A simulation study demonstrates improvements in estimation accuracy, uncertainty quantification, and variable selection relative to existing alternatives. The methodology is applied to actigraphy data to investigate the association between intraday physical activity and responses to a sleep questionnaire. Supplementary materials for this article are available online.

[1]  Matthew Reimherr,et al.  The function-on-scalar LASSO with applications to longitudinal GWAS , 2016, 1610.07403.

[2]  M. Wand,et al.  Mean field variational bayes for elaborate distributions , 2011 .

[3]  David S. Matteson,et al.  A Bayesian Multivariate Functional Dynamic Linear Model , 2014, 1411.0764.

[4]  Jeffrey S. Morris,et al.  Bayesian function‐on‐function regression for multilevel functional data , 2015, Biometrics.

[5]  Philip T. Reiss,et al.  The International Journal of Biostatistics Fast Function-on-Scalar Regression with Penalized Basis Expansions , 2011 .

[6]  Hongxiao Zhu,et al.  Robust, Adaptive Functional Regression in Functional Mixed Model Framework , 2011, Journal of the American Statistical Association.

[7]  J. Schrack,et al.  Generalized multilevel function‐on‐scalar regression and principal component analysis , 2015, Biometrics.

[8]  Brian Neelon,et al.  Bayesian Latent Factor Regression for Functional and Longitudinal Data , 2012, Biometrics.

[9]  Jeffrey S. Morris,et al.  Wavelet‐based functional mixed models , 2006, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[10]  C. Crainiceanu,et al.  Corrected Confidence Bands for Functional Data Using Principal Components , 2013, Biometrics.

[11]  Catherine P. Jayapandian,et al.  Scaling Up Scientific Discovery in Sleep Medicine: The National Sleep Research Resource. , 2016, Sleep.

[12]  Hongzhe Li,et al.  Group SCAD regression analysis for microarray time course gene expression data , 2007, Bioinform..

[13]  Purushottam W. Laud,et al.  Predictive Model Selection , 1995 .

[14]  B. Mallick,et al.  Fast sampling with Gaussian scale-mixture priors in high-dimensional regression. , 2015, Biometrika.

[15]  James M. Flegal,et al.  Multivariate output analysis for Markov chain Monte Carlo , 2015, Biometrika.

[16]  Wensheng Guo Functional Mixed Effects Models , 2002 .

[17]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[18]  Ciprian M. Crainiceanu,et al.  Bayesian Analysis for Penalized Spline Regression Using WinBUGS , 2005 .

[19]  R. Ogden,et al.  Variable selection in function‐on‐scalar regression , 2016, Stat.

[20]  Daniel R. Kowal Dynamic Function-on-Scalars Regression , 2018, 1806.01460.

[21]  J. Goldsmith,et al.  Assessing systematic effects of stroke on motor control by using hierarchical function‐on‐scalar regression , 2016, Journal of the Royal Statistical Society. Series C, Applied statistics.

[22]  David S. Matteson,et al.  Dynamic shrinkage processes , 2017, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[23]  Alan E. Gelfand,et al.  Model choice: A minimum posterior predictive loss approach , 1998, AISTATS.

[24]  C. Carvalho,et al.  Decoupling Shrinkage and Selection in Bayesian Linear Models: A Posterior Summary Perspective , 2014, 1408.0464.

[25]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[26]  H. Rue Fast sampling of Gaussian Markov random fields , 2000 .

[27]  D. Dunson,et al.  Sparse Bayesian infinite factor models. , 2011, Biometrika.

[28]  Luo Xiao,et al.  Fast bivariate P‐splines: the sandwich smoother , 2013 .

[29]  M. Reimherr,et al.  High-dimensional adaptive function-on-scalar regression , 2016, 1610.07507.

[30]  Jeffrey S. Morris Functional Regression , 2014, 1406.4068.

[31]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[32]  Aki Vehtari,et al.  On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior , 2016, AISTATS.

[33]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[34]  Guo-Qiang Zhang,et al.  The National Sleep Research Resource: Towards a Sleep Data Commons , 2018, BCB.

[35]  James O. Ramsay,et al.  Functional Data Analysis , 2005 .

[36]  Ciprian M. Crainiceanu,et al.  refund: Regression with Functional Data , 2013 .

[37]  Subhashis Ghosal,et al.  Bayesian Estimation of Principal Components for Functional Data , 2017 .

[38]  B. Caffo,et al.  MULTILEVEL FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS. , 2009, The annals of applied statistics.