Uncertainty quantification for principal component regression

Abstract: Principal component regression is an effective dimension reduction method for regression problems. To apply it in practice, one typically starts by selecting the number of principal components k, then estimates the corresponding regression parameters using say maximum likelihood, and finally obtains predictions with the fitted results. The success of this approach highly depends on the choice of k, and very often, due to the noisy nature of the data, it could be risky to just use one single value of k. Using the generalized fiducial inference framework, this paper develops a method for constructing a probability function on k, which provides an uncertainty measure on its value. In addition, this paper also constructs novel confidence intervals for the regression parameters and prediction intervals for future observations. The proposed methodology is backed up by theoretical results and is tested by simulation experiments and compared with other methods using real data. To the best of our knowledge, this is the first time that a full treatment for uncertainty quantification is formally considered for principal component regression.

[1]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[2]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[3]  Fusion Learning , 2020, Global Demand for Borderless Online Degrees.

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  Jianfeng Xu,et al.  Motion synthesis for affective agents using piecewise principal component regression , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[6]  I. Jolliffe A Note on the Use of Principal Components in Regression , 1982 .

[7]  T. E. Sterne,et al.  Inverse Probability , 1930, Nature.

[8]  Chaoyang Zhu,et al.  Improved principal component analysis and linear regression classification for face recognition , 2018, Signal Process..

[9]  H. Hotelling The relations of the newer multivariate statistical methods to factor analysis. , 1957 .

[10]  Randy C. S. Lai,et al.  Generalized Fiducial Inference: A Review and New Results , 2016 .

[11]  Jan Hannig,et al.  Fiducial prediction intervals , 2012 .

[12]  Minge Xie,et al.  Prediction with confidence—A general framework for predictive inference , 2017 .

[13]  Todd Iverson,et al.  Generalized fiducial inference , 2014 .

[14]  Mika P. Tarvainen,et al.  A Principal Component Regression Approach for Estimation of Ventricular Repolarization Characteristics , 2010, IEEE Transactions on Biomedical Engineering.

[15]  Hong Wang,et al.  A novel hybrid approach utilizing principal component regression and random forest regression to bridge the period of GPS outages , 2015, Neurocomputing.

[16]  Jan Hannig,et al.  Generalized Fiducial Inference for Ultrahigh-Dimensional Regression , 2013, 1304.7847.

[17]  Christina Kendziorski,et al.  Combined Expression Trait Correlations and Expression Quantitative Trait Locus Mapping , 2006, PLoS genetics.

[18]  Arthur P. Dempster,et al.  The Dempster-Shafer calculus for statisticians , 2008, Int. J. Approx. Reason..

[19]  Jar-Ferr Yang,et al.  Improved Principal Component Regression for Face Recognition Under Illumination Variations , 2012, IEEE Signal Processing Letters.

[20]  Yu-Long Xie,et al.  Evaluation of principal component selection methods to form a global prediction model by principal component regression , 1997 .

[21]  Jan Hannig Generalized fiducial inference via discretization , 2013 .

[22]  B. Efron Bayes' Theorem in the 21st Century , 2013, Science.

[23]  John H. Kalivas,et al.  Which principal components to utilize for principal component regression , 1992 .

[24]  Brian J Reich,et al.  Consistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions , 2012, Journal of the American Statistical Association.

[25]  Jianguo Sun,et al.  A correlation principal component regression analysis of NIR data , 1995 .