Gene Selection with Rough Sets for the Molecular Diagnosing of Tumor Based on Support Vector Machines

The development of microarray technology has motivated interest of its use in clinical diagnosis of tumor and drug discovery. However the accurate classification of tumor by selecting the tumor-related genes from thousands of genes is a difficulty task due to the large number of redundant genes. Therefore, we propose a novel hybrid approach which combines rough set theory with support vector machines to further improve the classification performance of gene expression data. Our approach is assessed on two well-known tumor datasets, and experiments indicate that gene selection based on the rough set theory is effective because most of the selected genes are relevant to tumor using rough set attribute reduction, and support vector machines classifier has a better performance on the selected informative genes.

[1]  B. Seed,et al.  Isolation of a cDNA encoding CD33, a differentiation antigen of myeloid progenitor cells. , 1988, Journal of immunology.

[2]  David Baltimore,et al.  A new homeobox gene contributes the DNA binding domain of the t(1;19) translocation protein in pre-B all , 1990, Cell.

[3]  J. Li,et al.  Specific in vivo association between the bHLH and LIM proteins implicated in human T cell leukemia. , 1994, The EMBO journal.

[4]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[5]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[6]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Roland Eils,et al.  Mining Gene Expression Data using Rough Set Theory , 1999 .

[8]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[9]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Jan Komorowski,et al.  Learning Rough Set Classifiers from Gene Expressions and Clinical Data , 2002, Fundam. Informaticae.

[11]  Sung-Bae Cho,et al.  Machine Learning in DNA Microarray Analysis for Cancer Classification , 2003, APBC.

[12]  Cheng-Yan Kao,et al.  Ranking Genes for Discriminability on Microarray Data , 2003, J. Inf. Sci. Eng..

[13]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[14]  Krzysztof Fujarewicz,et al.  Using SVD and SVM methods for selection, classification, clustering and modeling of DNA microarray data , 2004, Eng. Appl. Artif. Intell..

[15]  Takashi Takenouchi,et al.  Statistical Learning Theory by Boosting Method , 2004 .

[16]  Hiroshi Nakamura,et al.  Multidimensional support vector machines for visualization of gene expression data , 2004, SAC '04.

[17]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[18]  Hitoshi Iba,et al.  Extraction of informative genes from microarray data , 2005, GECCO '05.

[19]  Wei Chu,et al.  Biomarker discovery in microarray gene expression data with Gaussian processes , 2005, Bioinform..

[20]  Fillia Makedon,et al.  HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data , 2005, Bioinform..

[21]  Jerzy W. Grzymala-Busse,et al.  Leukemia Prediction from Gene Expression Data-A Rough Set Approach , 2006, ICAISC.

[22]  Jin-Kao Hao,et al.  A Hybrid GA/SVM Approach for Gene Selection and Classification of Microarray Data , 2006, EvoWorkshops.

[23]  Wei Luo,et al.  Feature Selection for Cancer Classification Based on Support Vector Machine , 2009, 2009 WRI Global Congress on Intelligent Systems.