论文信息 - Improved algorithms for high-dimensional robust PCA

Improved algorithms for high-dimensional robust PCA

Principal component analysis (PCA) is one of the most crucial dimensionality reduction methods and widely used in satellite image analysis, face recognition, social network feature extraction and other application scenarios. But it is fragile because of its quadratic error criterion when faced with outliers. There are many robust PCAs to solve this problem, however, when extended to the high-dimensional set, in which the dimensionality of dataset is comparable to the number of observations, the quality of their output will suffer dramatically. So the high-dimensional robust PCA is needed eagerly. HR-PCA and DHR-PCA are exactly the algorithms designed for this regime. They are robust and fit for high-dimensional space. However, attention has to be paid to the low efficiency and loss of useful information respectively in these two algorithms. Two improved algorithms are proposed in this paper to address these issues. Firstly, a preprocessing mechanism is added before performing HR-PCA so as to reduce the burden of HR-PCA. Secondly, we take different ways of changing weights of observations according to their outlyingness instead of decreasing all weights in DHR-PCA. From the simulation results, the improved algorithms are robust enough and have stable performance.

[1] Gregory Piatetsky-Shapiro,et al. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[2] Shie Mannor,et al. Outlier-Robust PCA: The High-Dimensional Case , 2013, IEEE Transactions on Information Theory.

[3] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[4] Shuicheng Yan,et al. Robust PCA in High-dimension: A Deterministic Approach , 2012, ICML.

[5] D. W. Scott,et al. PROBABILITY DENSITY ESTIMATION IN HIGHER DIMENSIONS , 2014 .

[6] Victor J. Yohai,et al. The Behavior of the Stahel-Donoho Robust Multivariate Estimator , 1995 .