Feature Extraction from Tumor Gene Expression Profiles Using DCT and DFT

Feature extraction plays a key role in tumor classification based on gene expression profiles, which can improve the performance of classifier. We design two novel feature extraction methods to extract tumor-related features. One is combining gene ranking and discrete cosine transform (DCT) with principal component analysis (PCA), and another is combining gene ranking and discrete Fourier transform (DFT) with PCA. The proposed feature extraction methods are proved successfully and effectively to classify tumor dataset. Experiments show that the obtained classification performance are very steady, which are evaluated by support vector machines (SVM) and K-nearest neighbor (K-NN) classifier on two well-known tumor datasets. Experiment results also show that the 4-fold cross-validated accuracy rate of 100% is obtained for the leukemia dataset and 96.77% for the colon tumor dataset. Compared with other related works, the proposed method not only has higher classification accuracy rate but also is steadier in classification performance.

[1]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[3]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[4]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[5]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[6]  Lawrence Carin,et al.  Gene expression analysis: Joint feature selection and classifler design , 2004 .

[7]  Bernhard Schölkopf,et al.  Gene Expression Analysis: Joint Feature Selection and Classifier Design , 2004 .

[8]  L. Carin,et al.  Gene expression analysis : Joint feature selection and classifier design , 2004 .

[9]  Sergios Theodoridis,et al.  Pattern Recognition , 1998, IEEE Trans. Neural Networks.

[10]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  A. Danchin,et al.  Molecular diagnosis of human cancer type by gene expression profiles and independent component analysis , 2005, European Journal of Human Genetics.

[14]  Huowang Chen,et al.  The Classification of Tumor Using Gene Expression Profile Based on Support Vector Machines and Factor Analysis , 2006, Sixth International Conference on Intelligent Systems Design and Applications.

[15]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[16]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[17]  Shutao Li,et al.  Gene Feature Extraction Using T-Test Statistics and Kernel Partial Least Squares , 2006, ICONIP.

[18]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .