Bio-inspired machine learning in microarray gene selection and cancer classification

Microarray technology today has the ability of having the whole genome spotted on a single chip. It allows the biologist to inspect thousands of gene activities simultaneously. Machine learning approaches are suited and used to discovering the complex relationships between genes under controlled experimental conditions and classify microarray data by identifying a subset of informative genes embedded in a large data set that involves multiple classes and is infected with the high dimensionality noise. In this paper, a hybrid system integrates genetic algorithms and decision tree is proposed for genes expression analysis and prediction to their functionality for cancer classification. The learning capacity of decision trees used in the base learning systems is boosted by feature selection method. Experiments present preliminary results to demonstrate the capability of hybrid system to mine accurate classification rules for classifying prediction in comparable to traditional machine learning algorithms.

[1]  R. Regan,et al.  The detection of , 1973 .

[2]  J. M. Deutsch,et al.  Evolutionary algorithms for finding optimal gene sets in microarray prediction , 2003, Bioinform..

[3]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Patrick Tan,et al.  Genetic algorithms applied to multi-class prediction for the analysis of gene expression data , 2003, Bioinform..

[5]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[6]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[7]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[8]  Hiroshi Ashida [What is bioinformatics?]. , 2002, Rinsho byori. The Japanese journal of clinical pathology.

[9]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[10]  Sung-Bae Cho,et al.  Machine Learning in DNA Microarray Analysis for Cancer Classification , 2003, APBC.

[11]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[12]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[13]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[15]  J. M. Oshorn Proc. Nat. Acad. Sei , 1978 .

[16]  Jerzy W. Bala,et al.  Hybrid Learning Using Genetic Algorithms and Decision Trees for Pattern Classification , 1995, IJCAI.

[17]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[18]  Wentian Li,et al.  How Many Genes are Needed for a Discriminant Microarray Data Analysis , 2001, physics/0104029.

[19]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[20]  Christian A. Rees,et al.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[21]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[22]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[23]  M. Gerstein,et al.  What is bioinformatics ? An introduction and overview , 2001 .