Weighted principal component analysis: a weighted covariance eigendecomposition approach

We present a new straightforward principal component analysis (PCA) method based on the diagonalization of the weighted variance–covariance matrix through two spectral decomposition methods: power iteration and Rayleigh quotient iteration. This method allows one to retrieve a given number of orthogonal principal components amongst the most meaningful ones for the case of problems with weighted and/or missing data. Principal coefficients are then retrieved by fitting principal components to the data while providing the final decomposition. Tests performed on real and simulated cases show that our method is optimal in the identification of the most significant patterns within data sets. We illustrate the usefulness of this method by assessing its quality on the extrapolation of Sloan Digital Sky Survey quasar spectra from measured wavelengths to shorter and longer wavelengths. Our new algorithm also benefits from a fast and flexible implementation.

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[3]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[4]  B. Parlett The Rayleigh Quotient Iteration and Some Generalizations for Nonnormal Matrices , 1974 .

[5]  S. Zamir,et al.  Lower Rank Approximation of Matrices by Least Squares With Any Choice of Weights , 1979 .

[6]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[7]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[8]  Darren T. Andrews,et al.  Maximum likelihood principal component analysis , 1997 .

[9]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[10]  A. Szalay,et al.  Spectral Classification of Quasars in the Sloan Digital Sky Survey: Eigenspectra, Redshift, and Luminosity Effects , 2004, astro-ph/0408578.

[11]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[12]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[13]  Stephen Bailey,et al.  Principal Component Analysis with Noisy and/or Missing Data , 2012, 1208.4122.

[14]  D. Hogg,et al.  A DATA-DRIVEN MODEL FOR SPECTRA: FINDING DOUBLE REDSHIFTS IN THE SLOAN DIGITAL SKY SURVEY , 2012, 1201.3370.

[15]  Adam D. Myers,et al.  The Sloan Digital Sky Survey quasar catalog: tenth data release , 2013, 1311.4870.

[16]  A. Bijaoui,et al.  The Gaia astrophysical parameters inference system (Apsis) - Pre-launch description , 2013, 1309.2157.

[17]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.