Penalized PCA approaches for B-spline expansions of smooth functional data

Functional principal component analysis (FPCA) is a dimension reduction technique that explains the dependence structure of a functional data set in terms of uncorrelated variables. In many applications the data are a set of smooth functions observed with error. In these cases the principal components are difficult to interpret because the estimated weight functions have a lot of variability and lack of smoothness. The most common way to solve this problem is based on penalizing the roughness of a function by its integrated squared d-order derivative. Two alternative forms of penalized FPCA based on B-spline basis expansions of sample curves and a simpler discrete penalty that measures the roughness of a function by summing squared d-order differences between adjacent B-spline coefficients (P-spline penalty) are proposed in this paper. The main difference between both smoothed FPCA approaches is that the first uses the P-spline penalty in the least squares approximation of the sample curves in terms of a B-spline basis meanwhile the second introduces the P-spline penalty in the orthonormality constraint of the algorithm that computes the principal components. Leave-one-out cross-validation is adapted to select the smoothing parameter for these two smoothed FPCA approaches. A simulation study and an application with chemometric functional data are developed to test the performance of the proposed smoothed approaches and to compare the results with non penalized FPCA and regularized FPCA.

[1]  James O. Ramsay,et al.  Applied Functional Data Analysis: Methods and Case Studies , 2002 .

[2]  Catherine A. Sugar,et al.  Principal component models for sparse functional data , 1999 .

[3]  J. Ramsay,et al.  Principal components analysis of sampled functions , 1986 .

[4]  T. Tony Cai,et al.  Prediction in functional linear regression , 2006 .

[5]  Ana M. Aguilera,et al.  Using basis expansions for estimating functional PLS regression Applications with chemometric data , 2010 .

[6]  D. Ruppert Selecting the Number of Knots for Penalized Splines , 2002 .

[7]  H. Akaike A new look at the statistical model identification , 1974 .

[8]  Ana M. Aguilera,et al.  Forecasting with unequally spaced data by a functional principal component approach , 1999 .

[9]  M. Bhatti,et al.  The calculation of integrals involving B-splines by means of recursion relations , 2006, Appl. Math. Comput..

[10]  Angelika van der Linde,et al.  Variational Bayesian functional PCA , 2008, Comput. Stat. Data Anal..

[11]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[12]  Roberto Viviani,et al.  Functional principal component analysis of fMRI data , 2005, Human brain mapping.

[13]  Philippe Besse,et al.  Simultaneous non-parametric regressions of unbalanced longitudinal data , 1997 .

[14]  Jianhua Z. Huang,et al.  Functional principal components analysis via penalized rank one approximation , 2008, 0807.4862.

[15]  Hiroyuki Fujioka,et al.  Optimal smoothing and interpolating splines with constraints , 2007, 2007 46th IEEE Conference on Decision and Control.

[16]  A. Cuevas,et al.  Linear functional regression: The case of fixed design and functional response , 2002 .

[17]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[18]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[19]  J. Dauxois,et al.  Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference , 1982 .

[20]  Deville Méthodes statistiques et numériques de l'analyse harmonique , 1974 .

[21]  M. Durbán,et al.  Flexible smoothing with P-splines: a unified approach , 2002 .

[22]  W. Saeys,et al.  Potential applications of functional data analysis in chemometrics , 2008 .

[23]  Clyde F. Martin,et al.  Optimal curve fitting and smoothing using normalized uniform B-splines: a tool for studying complex systems , 2005, Appl. Math. Comput..

[24]  F. O’Sullivan A Statistical Perspective on Ill-posed Inverse Problems , 1986 .

[25]  Ana M. Aguilera,et al.  Discussion of different logistic models with functional data. Application to Systemic Lupus Erythematosus , 2008, Comput. Stat. Data Anal..

[26]  Ke Chen,et al.  Applied Mathematics and Computation , 2022 .

[27]  M. M. Segovia-Gonzalez,et al.  Explaining functional principal component analysis to actuarial science with an example on vehicle insurance , 2009 .

[28]  H. V. Trees Detection, Estimation, And Modulation Theory , 2001 .

[29]  Jesús Picó,et al.  Data understanding with PCA: Structural and Variance Information plots , 2010 .

[30]  Frédéric Ferraty,et al.  Nonparametric Functional Data Analysis: Theory and Practice (Springer Series in Statistics) , 2006 .

[31]  H. Müller,et al.  Functional Data Analysis for Sparse Longitudinal Data , 2005 .

[32]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[33]  S. Konishi,et al.  Functional principal component analysis via regularized Gaussian basis expansions and its application to unbalanced data , 2007 .

[34]  A. M. Aguilera,et al.  Principal component estimation of functional logistic regression: discussion of two different approaches , 2004 .

[35]  P. Sarda,et al.  Functional linear model , 1999 .

[36]  Francisco A. Ocaña,et al.  Forecasting Pollen Concentration by a Two‐Step Functional Model , 2010, Biometrics.

[37]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[38]  Ana M. Aguilera,et al.  Computational considerations in functional principal component analysis , 2007, Comput. Stat..

[39]  B. Silverman,et al.  Smoothed functional principal components analysis by choice of norm , 1996 .

[40]  Ana M. Aguilera,et al.  Approximation of estimators in the PCA of a stochastic process using B-splines , 1996 .

[41]  Jane-ling Wang,et al.  Functional linear regression analysis for longitudinal data , 2005, math/0603132.

[42]  H. Muller,et al.  Generalized functional linear models , 2005, math/0505638.

[43]  P. J. García Nieto,et al.  Using multivariate adaptive regression splines and multilayer perceptron networks to evaluate paper manufactured using Eucalyptus globulus , 2012, Appl. Math. Comput..

[44]  F. Yao,et al.  Penalized spline models for functional principal component analysis , 2006 .

[45]  Paul H. C. Eilers,et al.  Splines, knots, and penalties , 2010 .

[46]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .