Exponential Family Functional data analysis via a low‐rank model

In many applications, non‐Gaussian data such as binary or count are observed over a continuous domain and there exists a smooth underlying structure for describing such data. We develop a new functional data method to deal with this kind of data when the data are regularly spaced on the continuous domain. Our method, referred to as Exponential Family Functional Principal Component Analysis (EFPCA), assumes the data are generated from an exponential family distribution, and the matrix of the canonical parameters has a low‐rank structure. The proposed method flexibly accommodates not only the standard one‐way functional data, but also two‐way (or bivariate) functional data. In addition, we introduce a new cross validation method for estimating the latent rank of a generalized data matrix. We demonstrate the efficacy of the proposed methods using a comprehensive simulation study. The proposed method is also applied to a real application of the UK mortality study, where data are binomially distributed and two‐way functional across age groups and calendar years. The results offer novel insights into the underlying mortality pattern.

[1]  A. Galecki JULIAN J. FARAWAY. Extending the Linear Model with R: Generalized Linear, Mixed Effects, and Nonparametric Regression Models, 2nd edition. Boca Raton: CRC Press , 2017 .

[2]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[3]  J. Schrack,et al.  Generalized multilevel function‐on‐scalar regression and principal component analysis , 2015, Biometrics.

[4]  Ana-Maria Staicu,et al.  Multilevel Cross‐Dependent Binary Longitudinal Data , 2013, Biometrics.

[5]  Jianhua Z. Huang,et al.  Robust regularized singular value decomposition with application to mortality data , 2013, 1311.7480.

[6]  Jianhua Z. Huang,et al.  The Analysis of Two-Way Functional Data Using Two-Way Regularized Singular Value Decompositions , 2009 .

[7]  Patrick O. Perry,et al.  Bi-cross-validation of the SVD and the nonnegative matrix factorization , 2009, 0908.2062.

[8]  B. Nadler,et al.  Determining the number of components in a factor model from limited noisy data , 2008 .

[9]  H. Müller,et al.  Modelling sparse generalized longitudinal observations with latent Gaussian processes , 2008 .

[10]  Jianhua Z. Huang,et al.  Functional principal components analysis via penalized rank one approximation , 2008, 0807.4862.

[11]  Z. Q. John Lu,et al.  Nonparametric Functional Data Analysis: Theory And Practice , 2007, Technometrics.

[12]  P. Vieu,et al.  Nonparametric Functional Data Analysis: Theory and Practice (Springer Series in Statistics) , 2006 .

[13]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[14]  J. Bai,et al.  Determining the Number of Factors in Approximate Factor Models , 2000 .

[15]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[16]  Ana-Maria Staicu,et al.  A note on modeling sparse exponential-family functional response curves , 2017, Comput. Stat. Data Anal..

[17]  J. Josse,et al.  Selecting the number of components in principal component analysis using cross-validation approximations , 2012, Comput. Stat. Data Anal..