Improving the performance of ICA based microarray data prediction models with genetic algorithm

It is a challenging task to diagnose tumor type precisely based on microarray data because the number of variables p (genes) is far larger than that of samples, n. Many independent component analysis (ICA) based models had been proposed to tackle the microarray data classification problem with great success. Although it was pointed out that different independent components (ICs) are of different biological significance, up to now, it is still far from well explored for the problem that how to select proper IC subsets to predict new samples best. We try to improve the performance of ICA based classification models by using proper IC subsets instead of all the ICs. A genetic algorithm (GA) based selection process is proposed in this paper, and the selected IC subset is evaluated by the leave-one-out cross validation (LOOCV) technique. The experimental results demonstrate that our GA based IC selection method can further improve the classification accuracy of the ICA based prediction models.

[1]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[2]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[3]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  A. Danchin,et al.  Molecular diagnosis of human cancer type by gene expression profiles and independent component analysis , 2005, European Journal of Human Genetics.

[5]  Yan Chen,et al.  Tumor classification based on independent component analysis , 2006, Int. J. Pattern Recognit. Artif. Intell..

[6]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  D. Ghosh Penalized Discriminant Methods for the Classification of Tumors from Gene Expression Data , 2003, Biometrics.

[8]  De-Shuang Huang,et al.  Independent component analysis-based penalized discriminant method for tumor classification using gene expression data , 2006, Bioinform..

[9]  David Lindgren,et al.  Independent component analysis reveals new and biologically significant structures in micro array data , 2006, BMC Bioinformatics.

[10]  Johan A. K. Suykens,et al.  Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction , 2004, Bioinform..

[11]  N. Iizuka,et al.  MECHANISMS OF DISEASE Mechanisms of disease , 2022 .

[12]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[13]  Lutz Prechelt,et al.  Automatic early stopping using cross validation: quantifying the criteria , 1998, Neural Networks.

[14]  Bruno Torrésani,et al.  Blind Source Separation and the Analysis of Microarray Data , 2004, J. Comput. Biol..

[15]  J. Elashoff,et al.  On the choice of variables in classification problems with dichotomous variables. , 1967, Biometrika.

[16]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[17]  Francesc J. Ferri,et al.  Comparative study of techniques for large-scale feature selection* *This work was suported by a SERC grant GR/E 97549. The first author was also supported by a FPI grant from the Spanish MEC, PF92 73546684 , 1994 .

[18]  Marian Stewart Bartlett,et al.  Face recognition by independent component analysis , 2002, IEEE Trans. Neural Networks.