Flexible Principal Component Analysis for Exponential Family Distributions

Traditional principal component analysis (PCA) is well known in high-dimensional data analysis, but it requires to express data by a matrix with observations to be continuous. To overcome the limitations, a new method called flexible PCA (FPCA) for exponential family distributions is proposed. The goal is to ensure that it can be implemented to arbitrary shaped region for either count or continuous observations. The methodology of FPCA is developed under the framework of generalized linear models. It provides statistical models for FPCA not limited to matrix expressions of the data. A maximum likelihood approach is proposed to derive the decomposition when the number of principal components (PCs) is known. This naturally induces a penalized likelihood approach to determine the number of PCs when it is unknown. By modifying it for missing data problems, the proposed method is compared with previous PCA methods for missing data. The simulation study shows that the performance of FPCA is always better than its competitors. The application uses the proposed method to reduce the dimensionality of arbitrary shaped sub-regions of images and the global spread patterns of COVID-19 under normal and Poisson distributions, respectively. AMS 2000 Subject Classification: 62H25; 62H35, 62J05

[1]  Julie Josse,et al.  Principal component analysis with missing values: a comparative survey of methods , 2015, Plant Ecology.

[2]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[3]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[4]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[5]  Ting Yu,et al.  Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study , 2020, The Lancet.

[6]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[7]  A. Maclean,et al.  A comparison of canonical discriminant analysis and principal component analysis for spectral transformation. , 2000 .

[8]  David Zhang,et al.  Two-stage image denoising by principal component analysis with local pixel grouping , 2010, Pattern Recognit..

[9]  Turgay Çelik,et al.  Unsupervised Change Detection in Satellite Images Using Principal Component Analysis and $k$-Means Clustering , 2009, IEEE Geoscience and Remote Sensing Letters.

[10]  Low-Rank Sparse Tensor Approximations for Large High-Resolution Videos , 2020, 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA).

[11]  H. Tong,et al.  Article: 2 , 2002, European Financial Services Law.

[12]  Baijian Yang,et al.  Big Data Dimension Reduction Using PCA , 2016, 2016 IEEE International Conference on Smart Cloud (SmartCloud).

[13]  Jin Hyun Park,et al.  Process monitoring using a Gaussian mixture model via principal component analysis and discriminant analysis , 2004, Comput. Chem. Eng..

[14]  T. Ferguson A Course in Large Sample Theory , 1996 .

[15]  R. Cook,et al.  Reweighting to Achieve Elliptically Contoured Covariates in Regression , 1994 .

[16]  Tonglin Zhang,et al.  Generalized k-means in GLMs with applications to the outbreak of COVID-19 in the United States , 2020, Computational Statistics & Data Analysis.

[17]  Shaoli Wang,et al.  On Directional Regression for Dimension Reduction , 2007 .

[18]  K. Gabriel,et al.  Generalised bilinear regression , 1998 .

[19]  Y. Hu,et al.  Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China , 2020, The Lancet.

[20]  S. I. V. Sousa,et al.  Multiple linear regression and artificial neural networks based on principal components to predict ozone concentrations , 2007, Environ. Model. Softw..

[21]  R. Cook,et al.  Sufficient Dimension Reduction via Inverse Regression , 2005 .

[22]  C. Viboud,et al.  Early epidemiological analysis of the coronavirus disease 2019 outbreak based on crowdsourced data: a population-level observational study , 2020, The Lancet Digital Health.

[23]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[24]  Runze Li,et al.  Regularization Parameter Selections via Generalized Information Criterion , 2010, Journal of the American Statistical Association.

[25]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[26]  J. Josse,et al.  missMDA: A Package for Handling Missing Values in Multivariate Data Analysis , 2016 .

[27]  A. Stewart Fotheringham,et al.  Principal Component Analysis on Spatial Data: An Overview , 2013 .

[28]  Baijian Yang,et al.  Dimension reduction for big data , 2018 .

[29]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[30]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[31]  Jianqing Fan,et al.  Variance estimation using refitted cross‐validation in ultrahigh dimensional regression , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[32]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[33]  B. Marx,et al.  Principal component estimation for generalized linear regression , 1990 .

[34]  J. Gardner Detection of vapours and odours from a multisensor array using pattern recognition Part 1. Principal component and cluster analysis , 1991 .

[35]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[36]  R. Christensen,et al.  Fisher Lecture: Dimension Reduction in Regression , 2007, 0708.3774.

[37]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[38]  Ker-Chau Li,et al.  Regression Analysis Under Link Violation , 1989 .

[39]  Kwok Pui Choi,et al.  Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis , 2018, The Annals of Statistics.

[40]  Peter D. Hoff,et al.  Bilinear Mixed-Effects Models for Dyadic Data , 2005 .

[41]  Yoonkyung Lee,et al.  Generalized Principal Component Analysis: Projection of Saturated Model Parameters , 2019, Technometrics.

[42]  Xiang Liu,et al.  Regression PCA for Moving Objects Separation , 2020, GLOBECOM 2020 - 2020 IEEE Global Communications Conference.