Exploring Features and Classifiers to Classify Gene Expression Profiles of Acute Leukemia

Bioinformatics has recently drawn a lot of attention to efficiently analyze biological genomic information with information technology, especially pattern recognition. In this paper, we attempt to explore extensive features and classifiers through a comparative study of the most promising feature selection methods and machine learning classifiers. The gene information from a patient's marrow expressed by DNA microarray, which is either the acute myeloid leukemia or acute lymphoblastic leukemia, is used to predict the cancer class. Pearson's and Spearman's correlation coefficients, Euclidean distance, cosine coefficient, information gain, mutual information and signal to noise ratio have been used for feature selection. Backpropagation neural network, self-organizing map, structure adaptive self-organizing map, support vector machine, inductive decision tree and k-nearest neighbor have been used for classification. Experimental results indicate that backpropagation neural network with Pearson's correlation coefficients produces the best result, 97.1% of recognition rate on the test data.

[1]  P. Brown,et al.  Yeast microarrays for genome wide parallel genetic and gene expression analysis. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[2]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[4]  Martin T. Hagan,et al.  Neural network design , 1995 .

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[8]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[9]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[10]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[11]  Nir Friedman,et al.  Tissue classification with gene expression profiles , 2000, RECOMB '00.

[12]  Wentian Li,et al.  How Many Genes are Needed for a Discriminant Microarray Data Analysis , 2001, physics/0104029.

[13]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[14]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[15]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[16]  Ming-Hsuan Yang,et al.  Gender classification with support vector machines , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[17]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[18]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Chris Sander The Journal Bioinformatics, key medium for computational biology , 2002, Bioinform..