Generalized multilevel function‐on‐scalar regression and principal component analysis

This manuscript considers regression models for generalized, multilevel functional responses: functions are generalized in that they follow an exponential family distribution and multilevel in that they are clustered within groups or subjects. This data structure is increasingly common across scientific domains and is exemplified by our motivating example, in which binary curves indicating physical activity or inactivity are observed for nearly 600 subjects over 5 days. We use a generalized linear model to incorporate scalar covariates into the mean structure, and decompose subject‐specific and subject‐day‐specific deviations using multilevel functional principal components analysis. Thus, functional fixed effects are estimated while accounting for within‐function and within‐subject correlations, and major directions of variability within and between subjects are identified. Fixed effect coefficient functions and principal component basis functions are estimated using penalized splines; model parameters are estimated in a Bayesian framework using Stan, a programming language that implements a Hamiltonian Monte Carlo sampler. Simulations designed to mimic the application have good estimation and inferential properties with reasonable computation times for moderate datasets, in both cross‐sectional and multilevel scenarios; code is publicly available. In the application we identify effects of age and BMI on the time‐specific change in probability of being active over a 24‐hour period; in addition, the principal components analysis identifies the patterns of activity that distinguish subjects and days within subjects.

[1]  R. Pearl Biometrics , 1914, The American Naturalist.

[2]  J. Rice,et al.  Smoothing spline models for the analysis of nested and crossed samples of curves , 1998 .

[3]  Christopher M. Bishop,et al.  Bayesian PCA , 1998, NIPS.

[4]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[5]  Wensheng Guo,et al.  Functional mixed effects models , 2012, Biometrics.

[6]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[7]  B. Ripley,et al.  Semiparametric Regression: Preface , 2003 .

[8]  U. Ekelund,et al.  Effect of combined movement and heart rate monitor placement on physical activity estimates during treadmill locomotion and free-living , 2006, European Journal of Applied Physiology.

[9]  H. Müller,et al.  Functional Data Analysis for Sparse Longitudinal Data , 2005 .

[10]  Jeffrey S. Morris,et al.  Wavelet‐based functional mixed models , 2006, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[11]  S. Rubin,et al.  Cognitive function, gait speed decline, and comorbidities: the health, aging and body composition study. , 2007, The journals of gerontology. Series A, Biological sciences and medical sciences.

[12]  L. Mâsse,et al.  Physical activity in the United States measured by accelerometer. , 2008, Medicine and science in sports and exercise.

[13]  H. Müller,et al.  Modelling sparse generalized longitudinal observations with latent Gaussian processes , 2008 .

[14]  Angelika van der Linde,et al.  Variational Bayesian functional PCA , 2008, Comput. Stat. Data Anal..

[15]  Andrew Thomas,et al.  The BUGS project: Evolution, critique and future directions , 2009, Statistics in medicine.

[16]  B. Caffo,et al.  MULTILEVEL FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS. , 2009, The annals of applied statistics.

[17]  A. Linde A Bayesian latent variable approach to functional principal components analysis with binary and count data , 2009 .

[18]  Philip T. Reiss,et al.  The International Journal of Biostatistics Fast Function-on-Scalar Regression with Penalized Basis Expansions , 2011 .

[19]  Evangelos Pappas,et al.  A comparison of energy expenditure estimates from the Actiheart and Actical physical activity monitors during low intensity activities, walking, and jogging , 2010, European Journal of Applied Physiology.

[20]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[21]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[22]  F. Scheipl Additive Mixed Models for Correlated Functional Data , 2012 .

[23]  Ana-Maria Staicu,et al.  Multilevel Cross‐Dependent Binary Longitudinal Data , 2013, Biometrics.

[24]  I-Min Lee,et al.  Patterns of accelerometer-assessed sedentary behavior in older women. , 2013, JAMA.

[25]  C. Crainiceanu,et al.  Corrected Confidence Bands for Functional Data Using Principal Components , 2013, Biometrics.

[26]  GelmanAndrew,et al.  The No-U-turn sampler , 2014 .

[27]  L. Ferrucci,et al.  Assessing the "physical cliff": detailed quantification of age-related differences in daily patterns of physical activity. , 2014, The journals of gerontology. Series A, Biological sciences and medical sciences.

[28]  Ciprian M Crainiceanu,et al.  Normalization and extraction of interpretable metrics from raw accelerometry data. , 2014, Biostatistics.

[29]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[30]  J. Goldsmith,et al.  Assessing systematic effects of stroke on motor control by using hierarchical function‐on‐scalar regression , 2016, Journal of the Royal Statistical Society. Series C, Applied statistics.