Molecular Diagnosis of Tumor Based on Independent Component Analysis and Support Vector Machines

Gene expression data that is being used to gather information from tissue samples is expected to significantly improve the development of efficient tumor diagnosis. For more accurate classification of tumor, extracting discriminant components from thousands of genes is an important problem which becomes challenging task due to the large number of genes and small sample size. We propose a novel approach which combines the revised feature score criterion with independent component analysis that has been developing recently to further improve the classification performance of gene expression data based on support vector machines. Two sets of gene expression data (colon tumor dataset and leukemia dataset) are examined to confirm that the proposed approach can extract a small quantity of independent components which drastically reduce the dimensionality of the original gene expression data when retaining higher recognition rate. For example, 100% cross-validation accuracy has been achieved with only extracting 2 or 3 independent components from leukemia dataset in our experiments

[1]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[2]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[3]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[5]  S.K. Mitra,et al.  Studying DNA microarray data using independent component analysis , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[6]  Cesare Furlanello,et al.  An accelerated procedure for recursive feature ranking on microarray data , 2003, Neural Networks.

[7]  Sanjit K. Mitra,et al.  Identifying underlying factors in breast cancer using independent component analysis , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).

[8]  Hiroshi Nakamura,et al.  Multidimensional support vector machines for visualization of gene expression data , 2004, SAC '04.

[9]  Nir Friedman,et al.  Tissue classification with gene expression profiles , 2000, RECOMB '00.

[10]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[11]  Michitaka Hirose,et al.  A PCA Based Method of Gene Expression Visual Analysis , 2003 .

[12]  Li Yingxin and Ruan Xiaogang,et al.  Feature Selection for Cancer Classification Based on Support Vector Machine , 2005 .

[13]  A. Danchin,et al.  Molecular diagnosis of human cancer type by gene expression profiles and independent component analysis , 2005, European Journal of Human Genetics.

[14]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[17]  Sung-Bae Cho,et al.  Machine Learning in DNA Microarray Analysis for Cancer Classification , 2003, APBC.

[18]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .