PCA Based on Graph Laplacian Regularization and P-Norm for Gene Selection and Clustering

In modern molecular biology, the hotspots and difficulties of this field are identifying characteristic genes from gene expression data. Traditional reconstruction-error-minimization model principal component analysis (PCA) as a matrix decomposition method uses quadratic error function, which is known sensitive to outliers and noise. Hence, it is necessary to learn a good PCA method when outliers and noise exist. In this paper, we develop a novel PCA method enforcing P-norm on error function and graph-Laplacian regularization term for matrix decomposition problem, which is called as PgLPCA. The heart of the method designing for reducing outliers and noise is a new error function based on non-convex proximal P-norm. Besides, Laplacian regularization term is used to find the internal geometric structure in the data representation. To solve the minimization problem, we develop an efficient optimization algorithm based on the augmented Lagrange multiplier method. This method is used to select characteristic genes and cluster the samples from explosive biological data, which has higher accuracy than compared methods.

[1]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[2]  C. Mathers,et al.  Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008 , 2010, International journal of cancer.

[3]  Jin-Xing Liu,et al.  A P-Norm Robust Feature Extraction Method for Identifying Differentially Expressed Genes , 2015, PloS one.

[4]  Yong Du,et al.  A graph-Laplacian PCA based on L1/2-norm constraint for characteristic gene selection , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[5]  Desire L. Massart,et al.  Feature selection in principal component analysis of analytical data , 2002 .

[6]  John Calvin Reed,et al.  Tumor suppressor p53 is a direct transcriptional activator of the human bax gene , 1995, Cell.

[7]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[8]  W. Foulkes,et al.  Analysis of the gene coding for the BRCA2-interacting protein PALB2 in familial and sporadic pancreatic cancer. , 2009, Gastroenterology.

[9]  J. Jessup,et al.  Autocrine-mediated ErbB-2 kinase activation of STAT3 is required for growth factor independence of pancreatic cancer cell lines , 2003, Oncogene.

[10]  P. Marker Highly purified CD44+ prostate cancer cells from xenograft human tumors are enriched in tumorigenic and metastatic progenitor cells , 2007 .

[11]  Chuang Lin,et al.  Discriminative Manifold Learning Based Detection of Movement-Related Cortical Potentials , 2016, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[12]  B. Ponder Cancer genetics , 2001, Nature.

[13]  Jiguo Yu,et al.  Block-Constraint Robust Principal Component Analysis and its Application to Integrated Analysis of TCGA Data , 2016, IEEE Transactions on NanoBioscience.

[14]  Yong Xu,et al.  Robust PCA based method for discovering differentially expressed genes , 2013, BMC Bioinformatics.

[15]  Giovanni Parmigiani,et al.  SMAD4 Gene Mutations Are Associated with Poor Prognosis in Pancreatic Cancer , 2009, Clinical Cancer Research.

[16]  Lin Yuan,et al.  Gene differential coexpression analysis based on biweight correlation and maximum clique , 2014, BMC Bioinformatics.

[17]  Derek Y. Chiang,et al.  The landscape of somatic copy-number alteration across human cancers , 2010, Nature.

[18]  Feiping Nie,et al.  Low-Rank Matrix Recovery via Efficient Schatten p-Norm Minimization , 2012, AAAI.

[19]  Feng Liu,et al.  A joint-L2, 1-norm-constraint-based semi-supervised feature extraction for RNA-Seq data analysis , 2017, Neurocomputing.

[20]  Jin Tang,et al.  Graph-Laplacian PCA: Closed-Form Solution and Robustness , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Rick Chartrand,et al.  Nonconvex Splitting for Regularized Low-Rank + Sparse Decomposition , 2012, IEEE Transactions on Signal Processing.

[22]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  D. Yoon,et al.  Clinical Significance of p16 Protein Expression Loss and Aberrant p53 Protein Expression in Pancreatic Cancer , 2005, Yonsei medical journal.

[24]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[25]  J. Grandis,et al.  The role of CD44 in HNSCC , 2007, Cancer biology & therapy.

[26]  I. Stamenkovic,et al.  CD44 is the principal cell surface receptor for hyaluronate , 1990, Cell.