A p-norm singular value decomposition method for robust tumor clustering

Tumor clustering based on biomolecular data plays a very important role for cancer classifications discovery. To further improve the robustness, stability and accuracy of tumor clustering, we develop a novel dimension reduction method named p-norm singular value decomposition (PSVD) to seek a low-rank approximation matrix to the bimolecular data. To enhance the robustness to outliers, the Lp-norm is taken as the error function and the Schatten p-norm is used as the regularization function in our optimization model. To evaluate the performance of PSVD, Kmeans clustering method is then employed for tumor clustering based on the low-rank approximation matrix. The extensive experiments are performed on gene expression dataset and cancer genome dataset respectively. All experimental results demonstrate that the PSVD-based method outperforms many existing methods. Especially it is experimentally proved that the proposed method is efficient for processing higher dimensional data with good robustness and superior time performance.

[1]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[2]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[3]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[4]  Yong Xu,et al.  Extracting plants core genes responding to abiotic stresses by penalized matrix decomposition , 2012, Comput. Biol. Medicine.

[5]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Lei Zhang,et al.  Tumor Clustering Using Nonnegative Matrix Factorization With Gene Selection , 2009, IEEE Transactions on Information Technology in Biomedicine.

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  Jun Zhang,et al.  Discovering the transcriptional modules using microarray data by penalized matrix decomposition , 2011, Comput. Biol. Medicine.

[9]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[10]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[11]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[12]  Feiping Nie,et al.  Joint Schatten $$p$$p-norm and $$\ell _p$$ℓp-norm robust matrix completion for missing value recovery , 2013, Knowledge and Information Systems.

[13]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[14]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[15]  Li Shang,et al.  Molecular Cancer Class Discovery Using Non-negative Matrix Factorization with Sparseness Constraint , 2007, ICIC.

[16]  Lei Zhang,et al.  Tumor Classification Based on Non-Negative Matrix Factorization Using Gene Expression Data , 2011, IEEE Transactions on NanoBioscience.

[17]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[18]  Ricardo J. G. B. Campello,et al.  Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Jianhua Z. Huang,et al.  Biclustering via Sparse Singular Value Decomposition , 2010, Biometrics.

[20]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[21]  M. J. D. Powell,et al.  A method for nonlinear constraints in minimization problems , 1969 .

[22]  Feiping Nie,et al.  Low-Rank Matrix Recovery via Efficient Schatten p-Norm Minimization , 2012, AAAI.

[23]  M. Hestenes Multiplier and gradient methods , 1969 .

[24]  Jin-Xing Liu,et al.  A P-Norm Robust Feature Extraction Method for Identifying Differentially Expressed Genes , 2015, PloS one.

[25]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[26]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[27]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.