Principal components analysis for sparsely observed correlated functional data using a kernel smoothing approach

In this paper, we consider the problem of estimating the covariance kernel and its eigenvalues and eigenfunctions from sparse, irregularly observed, noise corrupted and (possibly) correlated functional data. We present a method based on pre-smoothing of individual sample curves through an appropriate kernel. We show that the naive empirical covariance of the pre-smoothed sample curves gives highly biased estimator of the covariance kernel along its diagonal. We attend to this problem by estimating the diagonal and off-diagonal parts of the covariance kernel separately. We then present a practical and efficient method for choosing the bandwidth for the kernel by using an approximation to the leave-one-curve-out cross validation score. We prove that under standard regularity conditions on the covariance kernel and assuming i.i.d. samples, the risk of our estimator, under $L^2$ loss, achieves the optimal nonparametric rate when the number of measurements per curve is bounded. We also show that even when the sample curves are correlated in such a way that the noiseless data has a separable covariance structure, the proposed method is still consistent and we quantify the role of this correlation in the risk of the estimator.

[1]  R. Speicher,et al.  Lectures on the Combinatorics of Free Probability: The free commutator , 2006 .

[2]  P. Hall,et al.  Properties of principal component methods for functional and longitudinal data analysis , 2006, math/0608022.

[3]  Jeng-Min Chiou,et al.  Functional clustering and identifying substructures of longitudinal data , 2007 .

[4]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .

[5]  Gareth M. James,et al.  Functional linear discriminant analysis for irregularly sampled curves , 2001 .

[6]  M. Yuan,et al.  A Reproducing Kernel Hilbert Space Approach to Functional Linear Regression , 2010, 1211.2607.

[7]  Jin-Ting Zhang,et al.  Statistical inferences for functional data , 2007, 0708.2207.

[8]  Debashis Paul,et al.  CONSISTENCY OF RESTRICTED MAXIMUM LIKELIHOOD ESTIMATORS OF PRINCIPAL COMPONENTS , 2008, 0805.0465.

[9]  F. Yao,et al.  Penalized spline models for functional principal component analysis , 2006 .

[10]  Z. Q. John Lu,et al.  Nonparametric Functional Data Analysis: Theory And Practice , 2007, Technometrics.

[11]  Lubos Prchal,et al.  Changes in atmospheric radiation from the statistical point of view , 2007, Comput. Stat. Data Anal..

[12]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[13]  Sudipto Banerjee,et al.  Coregionalized Single‐ and Multiresolution Spatially Varying Growth Curve Modeling with Application to Weed Growth , 2006, Biometrics.

[14]  K. J. Utikal,et al.  Inference for Density Families Using Functional Principal Component Analysis , 2001 .

[15]  P. Sarda,et al.  Functional linear model , 1999 .

[16]  Philippe Besse,et al.  Simultaneous non-parametric regressions of unbalanced longitudinal data , 1997 .

[17]  H. Müller,et al.  Time-synchronized clustering of gene expression trajectories. , 2008, Biostatistics.

[18]  Jie Peng,et al.  A Geometric Approach to Maximum Likelihood Estimation of the Functional Principal Components From Sparse Longitudinal Data , 2007, 0710.5343.

[19]  Jie Peng,et al.  Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions , 2008, 0805.0463.

[20]  Colin O. Wu,et al.  Nonparametric Mixed Effects Models for Unequally Sampled Noisy Curves , 2001, Biometrics.

[21]  H. Cardot Nonparametric estimation of smoothed principal components analysis of sampled noisy functions , 2000 .

[22]  Catherine A. Sugar,et al.  Principal component models for sparse functional data , 1999 .

[23]  C. F. Sirmans,et al.  Nonstationary multivariate process modeling through spatially varying coregionalization , 2004 .

[24]  T. Tony Cai,et al.  Prediction in functional linear regression , 2006 .

[25]  Catherine A. Sugar,et al.  Clustering for Sparsely Sampled Functional Data , 2003 .

[26]  Joel L. Horowitz,et al.  Methodology and convergence rates for functional linear regression , 2007, 0708.0466.

[27]  H. Müller,et al.  Functional Data Analysis for Sparse Longitudinal Data , 2005 .

[28]  D. Paul,et al.  Asymptotics of the leading sample eigenvalues for a spiked covariance model , 2004 .

[29]  Arnab Maity,et al.  Reduced Rank Mixed Effects Models for Spatially Correlated Hierarchical Functional Data , 2010, Journal of the American Statistical Association.

[30]  C. Micchelli,et al.  On multivariate -splines , 1989 .

[31]  Tosio Kato Perturbation theory for linear operators , 1966 .