Reduct Generation and Classification of Gene Expression Data

Identification of gene subsets responsible for discerning between available samples of gene microarray data is an important task in bioinformatics. Due to the large number of genes in samples, there is an exponentially large search space of solutions. The main challenge is to reduce or remove the redundant genes, without affecting discernibility between objects. Reducts, from rough set theory, correspond to a minimal subset of essential genes. We present an algorithm for generating reducts from gene microarray data. It proceeds by preprocessing gene expression data, discretization of real value attributes into categorical followed by positive region based approach for reduct generation. For comparison, different approaches for reduct generation have also been discussed. Results on benchmark gene expression datasets demonstrate more than 90% reduction of redundant genes

[1]  Gerard V. Trunk,et al.  A Problem of Dimensionality: A Simple Example , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  K. Deb,et al.  Reliable classification of two-class cancer data using evolutionary algorithms. , 2003, Bio Systems.

[3]  Qiang Shen,et al.  A modular approach to generating fuzzy rules with reduced attributes for the monitoring of complex systems , 2000 .

[4]  C. S. George Lee,et al.  Neural fuzzy systems: a neuro-fuzzy synergism to intelligent systems , 1996 .

[5]  Qiang Shen,et al.  Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches , 2004, IEEE Transactions on Knowledge and Data Engineering.

[6]  Sankar K. Pal,et al.  Neuro-Fuzzy Pattern Recognition: Methods in Soft Computing , 1999 .

[7]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[8]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[9]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[10]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[11]  Xindong Wu,et al.  A Bayesian Discretizer for Real-Valued Attributes , 1996, Comput. J..

[12]  Sung-Bae Cho,et al.  Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features , 2002, Proc. IEEE.

[13]  Xiaohua Hu Knowledge discovery in databases: an attribute-oriented rough set approach , 1996 .

[14]  Ning Zhong,et al.  Using Rough Sets with Heuristics for Feature Selection , 1999, Journal of Intelligent Information Systems.

[15]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[16]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[17]  Andrzej Skowron,et al.  The Discernibility Matrices and Functions in Information Systems , 1992, Intelligent Decision Support.

[18]  Jiebo Luo,et al.  Data Mining. Multimedia, Soft Computing, and Bioinformatics , 2005, IEEE Transactions on Neural Networks.

[19]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[20]  John Fulcher,et al.  Computational Intelligence: An Introduction , 2008, Computational Intelligence: A Compendium.