Efficient Gene Selection with Rough Sets from Gene Expression Data

The main challenge of gene selection from gene expression dataset is to reduce the redundant genes without affecting discernibility between objects. A pipelined approach combining feature ranking together with rough sets attribute reduction for gene selection is proposed. Feature ranking is used to narrow down the gene space as the first step, top ranked genes are selected; the minimal reduct is induced by rough sets to eliminate the redundant attributes. An exploration of this approach on Leukemia gene expression data is conducted and good results are obtained with no preprocessing to the data. The experiment results show that this approach is successful for selecting high discriminative genes for cancer classification task.

[1]  Sushmita Mitra,et al.  Evolutionary Rough Feature Selection in Gene Expression Data , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[2]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[3]  Miao Duo,et al.  A HEURISTIC ALGORITHM FOR REDUCTION OF KNOWLEDGE , 1999 .

[4]  M. Leccia,et al.  Role of zyxin in differential cell spreading and proliferation of melanoma cells and melanocytes. , 2002, The Journal of investigative dermatology.

[5]  S. Mitra,et al.  Bioinformatics with soft computing , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[6]  K. Deb,et al.  Reliable classification of two-class cancer data using evolutionary algorithms. , 2003, Bio Systems.

[7]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[8]  Misao Ohki,et al.  Identification of a gene expression signature associated with pediatric AML prognosis. , 2003, Blood.

[9]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[10]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[11]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[12]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[13]  Wang Ju,et al.  Reduction algorithms based on discernibility matrix: The ordered attributes method , 2001, Journal of Computer Science and Technology.

[14]  Chris H. Q. Ding,et al.  Analysis of gene expression profiles: class discovery and leaf ordering , 2002, RECOMB '02.

[15]  Yang Wang,et al.  Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data , 2005, IEEE ACM Trans. Comput. Biol. Bioinform..

[16]  D Timmerman,et al.  Predicting the clinical behavior of ovarian cancer from gene expression profiles , 2005, International Journal of Gynecologic Cancer.

[17]  Julio J. Valdés,et al.  Gene Discovery in Leukemia Revisited: A Computational Intelligence Perspective , 2004, IEA/AIE.

[18]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[19]  B.F. Momin,et al.  Reduct Generation and Classification of Gene Expression Data , 2006, 2006 International Conference on Hybrid Information Technology.

[20]  V.S. Tseng,et al.  Efficiently mining gene expression data via a novel parameterless clustering method , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Wei Xie,et al.  Accurate Cancer Classification Using Expressions of Very Few Genes , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Sung-Bae Cho,et al.  Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features , 2002, Proc. IEEE.

[23]  Moonis Ali,et al.  Innovations in Applied Artificial Intelligence , 2005 .