Robust, Adaptive Functional Regression in Functional Mixed Model Framework

Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this article, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect, and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large datasets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g., images), and using other invertible transformations as alternatives to wavelets. Supplementary materials for this article are available online.

[1]  Joan G. Staniswalis,et al.  Nonparametric Regression Analysis of Longitudinal Data , 1998 .

[2]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[3]  Christos Davatzikos,et al.  Multilevel Functional Principal Component Analysis for High-Dimensional Data , 2011, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[4]  Jeffrey S. Morris,et al.  Wavelet-based functional mixed model analysis: Computational considerations , 2006 .

[5]  A. Laksaci,et al.  Robust nonparametric estimation for functional data , 2008 .

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  John A. D. Aston,et al.  Linguistic pitch analysis using functional principal component mixed effect models , 2010 .

[8]  J. Griffin,et al.  Alternative prior distributions for variable selection with very many more variables than observations , 2005 .

[9]  M. Clyde,et al.  Flexible empirical Bayes estimation for wavelets , 2000 .

[10]  I. Johnstone,et al.  Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences , 2004, math/0410088.

[11]  D. Gervini Robust functional estimation using the median and spherical principal components , 2008 .

[12]  Ana-Maria Staicu,et al.  Fast methods for spatially correlated multilevel functional data. , 2010, Biostatistics.

[13]  Jeffrey S. Morris,et al.  Analysing Mass Spectrometry data using wavelet-based functional mixed models. , 2006 .

[14]  Jeffrey S. Morris,et al.  Wavelet‐based functional mixed models , 2006, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[15]  H. Cordell,et al.  SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression , 2010, Genetic epidemiology.

[16]  I. Johnstone,et al.  Empirical Bayes selection of wavelet thresholds , 2005, math/0508281.

[17]  J. Rice,et al.  Smoothing spline models for the analysis of nested and crossed samples of curves , 1998 .

[18]  Wensheng Guo,et al.  Functional mixed effects models , 2012, Biometrics.

[19]  J. Marron,et al.  PCA CONSISTENCY IN HIGH DIMENSION, LOW SAMPLE SIZE CONTEXT , 2009, 0911.3827.

[20]  Efstathios Paparoditis,et al.  Wavelet Methods in Statistics with R , 2010 .

[21]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[22]  D. G. Simpson,et al.  Robust principal component analysis for functional data , 2007 .

[23]  Arnab Maity,et al.  Reduced Rank Mixed Effects Models for Spatially Correlated Hierarchical Functional Data , 2010, Journal of the American Statistical Association.

[24]  Piotr Kokoszka,et al.  Probability tails of wavelet coefficients of magnetometer records , 2006 .

[25]  Marianna Pensky,et al.  Frequentist optimality of Bayesian wavelet shrinkage rules for Gaussian and non-Gaussian noise , 2006, math/0607018.

[26]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[27]  Jeffrey S. Morris,et al.  Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum , 2005, Bioinform..

[28]  Brani Vidakovic,et al.  BAMS Method: Theory and Simulations , 2001 .

[29]  M. West On scale mixtures of normal distributions , 1987 .

[30]  David Ruppert,et al.  Semiparametric regression during 2003-2007. , 2009, Electronic journal of statistics.

[31]  B. Caffo,et al.  MULTILEVEL FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS. , 2009, The annals of applied statistics.

[32]  Brani Vidakovic,et al.  Larger posterior mode wavelet thresholding and applications , 2008, Journal of Statistical Planning and Inference.

[33]  A. Dawid Some matrix-variate distribution theory: Notational considerations and a Bayesian application , 1981 .

[34]  Sujit K. Ghosh,et al.  Essential Wavelets for Statistical Applications and Data Analysis , 2001, Technometrics.

[35]  D. Gervini Detecting and handling outlying trajectories in irregularly sampled functional datasets , 2010, 1011.0619.

[36]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[37]  Hulin Wu,et al.  Local Polynomial Mixed-Effects Models for Longitudinal Data , 2002 .

[38]  Jeffrey S. Morris,et al.  AUTOMATED ANALYSIS OF QUANTITATIVE IMAGE DATA USING ISOMORPHIC FUNCTIONAL MIXED MODELS, WITH APPLICATION TO PROTEOMICS DATA. , 2011, The annals of applied statistics.

[39]  Jeffrey S. Morris,et al.  Journal of the American Statistical Association Using Wavelet-based Functional Mixed Models to Characterize Population Heterogeneity in Accelerometer Profiles Using Wavelet-based Functional Mixed Models to Characterize Population Heterogeneity in Accelerometer Profiles: a Case Study , 2022 .

[40]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[41]  Yuedong Wang Mixed effects smoothing spline analysis of variance , 1998 .

[42]  P. Vieu,et al.  Nonparametric Functional Data Analysis: Theory and Practice (Springer Series in Statistics) , 2006 .

[43]  Nouna Kettaneh,et al.  Statistical Modeling by Wavelets , 1999, Technometrics.

[44]  R. Ogden,et al.  Essential Wavelets for Statistical Applications and Data Analysis , 1996 .

[45]  M. Hubert,et al.  A fast method for robust principal components with applications to chemometrics , 2002 .

[46]  B. Mallick,et al.  Bayesian Hierarchical Spatially Correlated Functional Data Analysis with Application to Colon Carcinogenesis , 2008, Biometrics.

[47]  J. Griffin,et al.  Inference with normal-gamma prior distributions in regression problems , 2010 .

[48]  Ying Ying Zhang,et al.  Conclusions and Discussions , 2011 .

[49]  Jeffrey S. Morris,et al.  Bayesian Analysis of Mass Spectrometry Proteomic Data Using Wavelet‐Based Functional Mixed Models , 2008, Biometrics.

[50]  Marina Vannucci,et al.  Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis , 2003 .

[51]  C. Jennison,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[52]  Brian Caffo,et al.  Longitudinal functional principal component analysis. , 2010, Electronic journal of statistics.

[53]  Jeffrey S. Morris,et al.  Analysis of Mass Spectrometry Data Using Bayesian Wavelet-Based Functional Mixed Models , 2006 .