Constrained Maximum Variance Mapping for Tumor Classification

It is of great importance to classify the gene expression data into different classes. In this paper, followed the assumption that the gene expression data of tumor may be sampled from the data with a probability distribution on a sub-manifold of ambient space, an efficient feature extraction method named as Constrained Maximum Variance Mapping (CMVM), is presented for tumor classification. The proposed algorithm can be viewed as a linear approximation of multi-manifolds learning based approach, which takes the local geometry and manifold labels into account. The proposed CMVM method was tested on four DNA microarray datasets, and the experimental results demonstrated that it is efficient for tumor classification.

[1]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[2]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[3]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[4]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[5]  C. Viroli,et al.  Supervised locally linear embedding for classification : an application to gene expression data analysis Supervised locally linear embedding in problemi di classificazione : un ’ applicazione all ’ analisi di dati di espressione genica , 2005 .

[6]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[7]  Chao Wang,et al.  Feature extraction using constrained maximum variance mapping , 2008, Pattern Recognit..

[8]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[9]  Aleix M. Martínez,et al.  Using the information embedded in the testing sample to break the limits caused by the small sample size in microarray-based classification , 2008, BMC Bioinformatics.

[10]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[11]  Ki-Yeol Kim,et al.  Improving the prediction accuracy in classification using the combined data sets by ranks of gene expressions , 2008, BMC Bioinformatics.

[12]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.