Semiparametric Functional Factor Models with Bayesian Rank Selection

Bayesian Rank Selection Daniel R. Kowal∗ and Antonio Canale† Abstract Functional data are frequently accompanied by parametric templates that describe the typical shapes of the functions. Although the templates incorporate critical domain knowledge, parametric functional data models can incur significant bias, which undermines the usefulness and interpretability of these models. To correct for model misspecification, we augment the parametric templates with an infinite-dimensional nonparametric functional basis. Crucially, the nonparametric factors are regularized with an ordered spike-and-slab prior that provides consistent rank selection and satisfies several appealing theoretical properties. This prior is accompanied by a parameter expansion scheme customized to boost MCMC efficiency, and is broadly applicable for Bayesian factor models. The nonparametric basis functions are learned from the data, yet constrained to be orthogonal to the parametric template in order to preserve distinctness between the parametric and nonparametric terms. The versatility of the proposed approach is illustrated through applications to synthetic data, human motor control data, and dynamic yield curve data. Relative to parametric alternatives, the proposed semiparametric functional factor model eliminates bias, reduces excessive posterior and predictive uncertainty, and provides reliable inference on the effective number of nonparametric terms—all with minimal additional computational costs.

[1]  Jonathan H. Wright,et al.  Forecasting Interest Rates with Shifting Endpoints , 2012 .

[2]  David S. Matteson,et al.  Functional Autoregression for Sparsely Sampled Data , 2016, 1603.02982.

[3]  E. George,et al.  Fast Bayesian Factor Analysis via Automatic Rotations to Sparsity , 2016 .

[4]  Thomas L. Griffiths,et al.  The Indian Buffet Process: An Introduction and Review , 2011, J. Mach. Learn. Res..

[5]  J. Goldsmith,et al.  Assessing systematic effects of stroke on motor control by using hierarchical function‐on‐scalar regression , 2016, Journal of the Royal Statistical Society. Series C, Applied statistics.

[6]  E. Fama,et al.  The Information in Long-Maturity Forward Rates , 1987 .

[7]  Sumio Watanabe,et al.  Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..

[8]  David S. Matteson,et al.  A Bayesian Multivariate Functional Dynamic Linear Model , 2014, 1411.0764.

[9]  Daniel R. Kowal Integer‐valued functional data analysis for measles forecasting , 2019, Biometrics.

[10]  Ute Beyer,et al.  Bayesian Forecasting And Dynamic Models , 2016 .

[11]  J. Schrack,et al.  Generalized multilevel function‐on‐scalar regression and principal component analysis , 2015, Biometrics.

[12]  Daniele Durante,et al.  A note on the multiplicative gamma process , 2016, 1610.03408.

[13]  Bruno Scarpa,et al.  Enriched Stick-Breaking Processes for Functional Data , 2014, Journal of the American Statistical Association.

[14]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[15]  A. Lijoi,et al.  On the Pitman–Yor process with spike and slab base measure , 2017 .

[16]  Caroline F Finch,et al.  Applications of functional data analysis: A systematic review , 2013, BMC Medical Research Methodology.

[17]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[18]  James O. Ramsay,et al.  Penalized regression with model‐based penalties , 2000 .

[19]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[20]  Wonyul Lee,et al.  Bayesian Semiparametric Functional Mixed Models for Serially Correlated Functional Data, With Application to Glaucoma Data , 2018, Journal of the American Statistical Association.

[21]  H. Mumtaz,et al.  The great moderation of the term structure of UK interest rates , 2009 .

[22]  Gary O Zerbe,et al.  Comparing linear and nonlinear mixed model approaches to cosinor analysis , 2003, Statistics in medicine.

[23]  Daniel R. Kowal Dynamic Regression Models for Time-Ordered Functional Data , 2020 .

[24]  Alejandro Cruz-Marcelo,et al.  Estimating the Term Structure With a Semiparametric Bayesian Hierarchical Model: An Application to Corporate Bonds , 2009, Journal of the American Statistical Association.

[25]  The U.S. Treasury yield curve: 1961 to the present , 2006 .

[26]  A. V. D. Vaart,et al.  Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences , 2012, 1211.1197.

[27]  L. Fahrmeir,et al.  Spike-and-Slab Priors for Function Selection in Structured Additive Regression Models , 2011, 1105.5250.

[28]  D. Dunson,et al.  Sparse Bayesian infinite factor models. , 2011, Biometrika.

[29]  James O. Ramsay,et al.  A Functional Data Analysis of the Pinch Force of Human Fingers , 1995 .

[30]  P. Gustafson,et al.  Conservative prior distributions for variance parameters in hierarchical models , 2006 .

[31]  Subhashis Ghosal,et al.  Bayesian Estimation of Principal Components for Functional Data , 2017 .

[32]  F. Diebold,et al.  Forecasting the Term Structure of Government Bond Yields , 2002 .

[33]  Sean E. Ryan,et al.  Social Distancing Has Merely Stabilized COVID-19 in the US , 2020, medRxiv.

[34]  M. Wand,et al.  Mean field variational bayes for elaborate distributions , 2011 .

[35]  L. K. Hotta,et al.  Bayesian extensions to Diebold-Li term structure model , 2010 .

[36]  R. O’Hara,et al.  A review of Bayesian variable selection methods: what, how and which , 2009 .

[37]  Peter D. Hoff,et al.  Monte Carlo Simulation on the Stiefel Manifold via Polar Expansion , 2019, J. Comput. Graph. Stat..

[38]  D. Dunson,et al.  Bayesian cumulative shrinkage for infinite factorizations. , 2019, Biometrika.

[39]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[40]  Siem Jan Koopman,et al.  A simple and efficient simulation smoother for state space time series analysis , 2002 .

[41]  Brian Neelon,et al.  Bayesian Latent Factor Regression for Functional and Longitudinal Data , 2012, Biometrics.

[42]  N. Shephard,et al.  Stochastic Volatility: Likelihood Inference And Comparison With Arch Models , 1996 .

[43]  James O. Ramsay,et al.  Functional Components of Variation in Handwriting , 2000 .

[44]  Jiguo Cao,et al.  Parametric functional principal component analysis , 2017, Biometrics.

[45]  Yongdai Kim,et al.  Posterior Consistency of Factor Dimensionality in High-Dimensional Sparse Factor Models , 2021, Bayesian Analysis.

[46]  J. Ramsay,et al.  Some Tools for Functional Data Analysis , 1991 .

[47]  Daniel R. Kowal,et al.  Bayesian Function-on-Scalars Regression for High-Dimensional Data , 2018, Journal of Computational and Graphical Statistics.

[48]  Haipeng Shen,et al.  Functional dynamic factor models with application to yield curve forecasting , 2012, 1209.6172.

[49]  Hal Daumé,et al.  The Infinite Hierarchical Factor Regression Model , 2008, NIPS.

[50]  V. Rocková,et al.  Bayesian estimation of sparse signals with a continuous spike-and-slab prior , 2018 .

[51]  A. Siegel,et al.  Parsimonious modeling of yield curves , 1987 .

[52]  M. Kenward,et al.  The Analysis of Longitudinal Data Using Mixed Model L‐Splines , 2006, Biometrics.

[53]  Ana Ivelisse Avilés,et al.  Linear Mixed Models for Longitudinal Data , 2001, Technometrics.

[54]  Luo Xiao,et al.  Model Testing for Generalized Scalar-on-Function Linear Models , 2019, 1906.04889.

[55]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[56]  Bruno Scarpa,et al.  Bayesian Hierarchical Functional Data Analysis Via Contaminated Informative Priors , 2009, Biometrics.

[57]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.