Lymphoma Cancer Classification Using Genetic Programming with SNR Features

Lymphoma cancer classification with DNA microarray data is one of important problems in bioinformatics. Many machine learning techniques have been applied to the problem and produced valuable results. However the medical field requires not only a high-accuracy classifier, but also the in-depth analysis and understanding of classification rules obtained. Since gene expression data have thousands of features, it is nearly impossible to represent and understand their complex relationships directly. In this paper, we adopt the SNR (Signal-to-Noise Ratio) feature selection to reduce the dimensionality of the data, and then use genetic programming to generate cancer classification rules with the features. In the experimental results on Lymphoma cancer dataset, the proposed method yielded 96.6% test accuracy in average, and an excellent arithmetic classification rule set that classifies all the samples correctly is discovered by the proposed method.

[1]  Sung-Bae Cho,et al.  Genetic search for optimal ensemble of feature-classifier pairs in DNA gene expression profiles , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[2]  Tong Heng Lee,et al.  Evolutionary computing for knowledge discovery in medical diagnosis , 2003, Artif. Intell. Medicine.

[3]  Gilles Venturini,et al.  Learning First Order Logic Rules with a Genetic Algorithm , 1995, KDD.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[6]  Vidroha Debroy,et al.  Genetic Programming , 1998, Lecture Notes in Computer Science.

[7]  Kenneth A. De Jong,et al.  Using genetic algorithms for concept learning , 1993, Machine Learning.

[8]  K. De Jong,et al.  Using Genetic Algorithms for Concept Learning , 2004, Machine Learning.

[9]  Alex A. Freitas,et al.  A survey of evolutionary algorithms for data mining and knowledge discovery , 2003 .

[10]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[11]  Sung-Bae Cho,et al.  Neural Network Ensemble with Negatively Correlated Features for Cancer Classification , 2003, ICANN.

[12]  K. Franssila,et al.  BCL2 overexpression in diffuse large B-cell lymphoma. , 1999, Leukemia & lymphoma.

[13]  Ivanoe De Falco,et al.  Discovering interesting classification rules with genetic programming , 2002, Appl. Soft Comput..

[14]  Craig A. Knoblock,et al.  Discovering Robust Knowledge from Databases that Change , 1998, Data Mining and Knowledge Discovery.

[15]  Alex Alves Freitas,et al.  Discovering comprehensible classification rules by using Genetic Programming: a case study in a medical domain , 1999, GECCO.

[16]  Erkki Oja,et al.  Artificial Neural Networks and Neural Information Processing — ICANN/ICONIP 2003 , 2003, Lecture Notes in Computer Science.

[17]  A. Brazma,et al.  Gene expression data analysis , 2000, FEBS letters.

[18]  Bruce A. Draper,et al.  Feature selection from huge feature sets , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[19]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[20]  Lalit M. Patnaik,et al.  Application of genetic programming for multicategory pattern classification , 2000, IEEE Trans. Evol. Comput..

[21]  Nir Friedman,et al.  Tissue classification with gene expression profiles , 2000, RECOMB '00.