Feature selection and clustering via robust graph-laplacian PCA based on capped L1-norm

In molecular biology, the selection of feature genes and tumor clustering are the hotspots and difficulties in bioinformatics research. The traditional PCA method based on the minimization of the squares of the loss function is sensitive to the outliers and noise. Therefore, it is necessary to design a new method to weaken the effects of errors and noise. In this paper, we propose a novel PCA method based graph-Laplacian and capped L1-norm, which is called as CgLPCA. The method can preserve the internal geometry of data by introducing graph-Laplacian. In addition, it uses the capped L1-norm on loss function to improve its robustness. The main contribution of this method is to preserve the nonlinear structure of the data while enhancing the robustness of the PCA-based method. We introduce the Augmented Lagrangian multiplier to solve the optimization problem. The CgLPCA method achieves the advanced level in feature selection and tumor clustering among various PCA-based methods.

[1]  BMC Bioinformatics , 2005 .