The Basics of Linear Principal Components Analysis

When you have obtained measures on a large number of variables, there may exist redundancy in those variables. Redundancy means that some of the variables are correlated with one another, possibly because they are measuring the same “thing”. Because of this redundancy, it should be possible to reduce the observed variables into a smaller number of variables. For example, if a group of variables are strongly correlated with one another, you do not need all of them in your analysis, but only one since you can predict the evolution of all the variables from that of one. This opens the central issue of how to select or build the representative variables of each group of correlated variables. The simplest way to do this is to keep one variable and discard all others, but this is not reasonable. Another alternative is to combine the variables in some way by taking perhaps a weighted average, as in the line of the well-known Human Development Indicator published by UNDP. However, such an approach calls the basic question of how to set the appropriate weights. If one has sufficient insight into the nature and magnitude of the interrelations among the variables, one might choose weights using one's individual judgment. Obviously, this introduces a certain amount of subjectivity into the analysis and may be questioned by practitioners. To overcome this shortcoming, another method is to let the data set uncover itself the relevant weights of variables. Principal Components Analysis (PCA) is a variable reduction method that can be used to achieve this goal. Technically this method delivers a relatively small set of synthetic variables called principal components that account for most of the variance in the original dataset.

[1]  Jorge Cadima Departamento de Matematica Loading and correlations in the interpretation of principle compenents , 1995 .

[2]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[3]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[4]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[5]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[6]  R. Gonzalez Applied Multivariate Statistics for the Social Sciences , 2003 .

[7]  L. Lebart,et al.  Statistique exploratoire multidimensionnelle , 1995 .

[8]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[9]  J. Horn,et al.  Cattell's Scree Test In Relation To Bartlett's Chi-Square Test And Other Observations On The Number Of Factors Problem. , 1979, Multivariate behavioral research.

[10]  Demetri Terzopoulos,et al.  Multilinear (Tensor) ICA and Dimensionality Reduction , 2007, ICA.

[11]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[12]  R. M. Durand,et al.  Assessing Sampling Variation Relative to Number-of-Factors Criteria , 1990 .

[13]  Raymond Hubbard,et al.  An empirical comparison of alternative methods for principal component extraction , 1987 .

[14]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[15]  Xianggui Qu,et al.  Multivariate Data Analysis , 2007, Technometrics.

[16]  H. Kaiser An index of factorial simplicity , 1974 .

[17]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[18]  Richard G. Montanelli,et al.  An Investigation of the Parallel Analysis Criterion for Determining the Number of Common Factors , 1975 .

[19]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[20]  H. Kaiser The Application of Electronic Computers to Factor Analysis , 1960 .

[21]  D. Lawley TESTS OF SIGNIFICANCE FOR THE LATENT ROOTS OF COVARIANCE AND CORRELATION MATRICES , 1956 .

[22]  Donald A. Jackson STOPPING RULES IN PRINCIPAL COMPONENTS ANALYSIS: A COMPARISON OF HEURISTICAL AND STATISTICAL APPROACHES' , 1993 .

[23]  J. Horn A rationale and test for the number of factors in factor analysis , 1965, Psychometrika.

[24]  I. Jolliffe Principal Component Analysis , 2002 .

[25]  Haiping Lu,et al.  MPCA: Multilinear Principal Component Analysis of Tensor Objects , 2008, IEEE Transactions on Neural Networks.

[26]  M. Bartlett TESTS OF SIGNIFICANCE IN FACTOR ANALYSIS , 1950 .

[27]  W. Velicer,et al.  Comparison of five rules for determining the number of components to retain. , 1986 .

[28]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .