RPCA-Based Tumor Classification Using Gene Expression Data

Microarray techniques have been used to delineate cancer groups or to identify candidate genes for cancer prognosis. As such problems can be viewed as classification ones, various classification methods have been applied to analyze or interpret gene expression data. In this paper, we propose a novel method based on robust principal component analysis (RPCA) to classify tumor samples of gene expression data. Firstly, RPCA is utilized to highlight the characteristic genes associated with a special biological process. Then, RPCA and RPCA+LDA (robust principal component analysis and linear discriminant analysis) are used to identify the features. Finally, support vector machine (SVM) is applied to classify the tumor samples of gene expression data based on the identified features. Experiments on seven data sets demonstrate that our methods are effective and feasible for tumor classification.

[1]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[2]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[3]  Shu-Lin Wang,et al.  Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification , 2012, BMC Bioinformatics.

[4]  Mark Stitt,et al.  Genome-Wide Reprogramming of Primary and Secondary Metabolism, Protein Synthesis, Cellular Growth Processes, and the Regulatory Infrastructure of Arabidopsis in Response to Nitrogen1[w] , 2004, Plant Physiology.

[5]  L. Sobin,et al.  TNM classification of malignant tumors, fifth edition (1997) , 1997, Cancer.

[6]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[7]  Genevera I. Allen,et al.  Sparse non-negative generalized PCA with applications to metabolomics , 2011, Bioinform..

[8]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[9]  Simon C. K. Shiu,et al.  Metasample-Based Sparse Representation for Tumor Classification , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  J. Welsh,et al.  Molecular classification of human carcinomas by use of gene expression signatures. , 2001, Cancer research.

[11]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[12]  Jian-Xun Mi,et al.  A Class-Information-Based Penalized Matrix Decomposition for Identifying Plants Core Genes Responding to Abiotic Stresses , 2014, PloS one.

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Yong Xu,et al.  Extracting plants core genes responding to abiotic stresses by penalized matrix decomposition , 2012, Comput. Biol. Medicine.

[15]  Ncbi National Center for Biotechnology Information , 2008 .

[16]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[17]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[18]  Fang-Xiang Wu,et al.  Sparse Representation for Classification of Tumors Using Gene Expression Data , 2009, Journal of biomedicine & biotechnology.

[19]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[20]  Yong Xu,et al.  Robust PCA based method for discovering differentially expressed genes , 2013, BMC Bioinformatics.

[21]  J. You,et al.  Differential Expression Analysis on RNA-Seq Count Data Based on Penalized Matrix Decomposition , 2014, IEEE Transactions on NanoBioscience.

[22]  田中 俊典 National Center for Biotechnology Information (NCBI) , 2012 .

[23]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[24]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[25]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[26]  Jian Yang,et al.  A Two-Phase Test Sample Sparse Representation Method for Use With Face Recognition , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Wei Jia,et al.  Robust Classification Method of Tumor Subtype by Using Correlation Filters , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Michael R. Kosorok,et al.  Identification of differential gene pathways with principal component analysis , 2009, Bioinform..

[29]  Phongphun Kijsanayothin,et al.  Tumor classification ranking from microarray data , 2008, BMC Genomics.

[30]  Jing-Yu Yang,et al.  Characteristic Gene Selection via Weighting Principal Components by Singular Values , 2012, PloS one.

[31]  Anne-Laure Boulesteix,et al.  Partial least squares: a versatile tool for the analysis of high-dimensional genomic data , 2006, Briefings Bioinform..

[32]  Mario Lauria,et al.  Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge , 2013, Bioinform..

[33]  Jiawei Han,et al.  SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[34]  D. Carter TNM Classification of Malignant Tumors , 1998 .

[35]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[37]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[38]  Lei Zhang,et al.  Tumor Classification Based on Non-Negative Matrix Factorization Using Gene Expression Data , 2011, IEEE Transactions on NanoBioscience.

[39]  Philippe Besse,et al.  Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems , 2011, BMC Bioinformatics.

[40]  Alex Pappachen James,et al.  Nearest Neighbor Classifier Based on Nearest Feature Decisions , 2012, Comput. J..

[41]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.