Robust and Sparse Kernel PCA and Its Outlier Map

Kernel principal component analysis (PCA) generalizes linear PCA to high-dimensional feature spaces, related to input space by some nonlinear map. One can efficiently compute principal components via an eigen-decomposition of the kernel matrix. Nevertheless, classical kernel PCA has two deficiencies: the lack of robustness and sparseness. It can be affected by outliers so strongly that the resulting eigenvectors will be tilted toward them. Moreover, the technique is not sparse, since each principal component in the Hilbert space is expressed in terms of kernels associated with every training pattern. To overcome these issues, we proposed a two-stage algorithm: a robust distance was computed to identify the uncontaminated data set, followed by estimating the best-fit ellipsoid to these data for an informative and concise representation. Finally, a kernel PCA outlier map was proposed to display and classify the outliers. Simulations with synthetic data show the effectiveness of our algorithm and its corresponding outlier map.