Selection of Discriminative Genes from Microarray Data

This chapter establishes the effectiveness of the fuzzy equivalence partition matrix (FEPM) for the problem of gene selection from microarray data and compares its performance with some existing methods on a set of microarray gene expression data sets. It first briefly introduces various evaluation criteria used for computing both the relevance and redundancy of the genes. To measure both gene-class relevance and gene-gene redundancy using information theoretic measures such as entropy, mutual information, and f-information measures, the true density functions of continuous-valued genes have to be approximated. The chapter presents several approaches to approximate the true probability density function for continuous-valued gene expression data. It then describes the problem of gene selection from microarray data sets using information theoretic approaches. Finally, the chapter reports a few case studies and a comparison among different approximation methods. fuzzy logic; numerical analysis; probability density function

[1]  Sankar K. Pal,et al.  Feature Selection Using f-Information Measures in Fuzzy Approximation Spaces , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  E. Domany Cluster Analysis of Gene Expression Data , 2002, physics/0206056.

[3]  Sankar K. Pal,et al.  Fuzzy–Rough Sets for Information Measures and Selection of Relevant Genes From Microarray Data , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[5]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[6]  Ash A. Alizadeh,et al.  Rheumatoid arthritis is a heterogeneous disease: evidence for differences in the activation of the STAT-1 pathway between rheumatoid tissues. , 2003, Arthritis and rheumatism.

[7]  Yanqing Zhang,et al.  Recursive Fuzzy Granulation for Gene Subsets Extraction and Cancer Classification , 2008, IEEE Transactions on Information Technology in Biomedicine.

[8]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Dominik Slezak,et al.  Interactive Gene Clustering—A Case Study of Breast Cancer Microarray Data , 2006, Inf. Syst. Frontiers.

[10]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  Tommy W. S. Chow,et al.  Efficient selection of discriminative genes from microarray gene expression data for cancer diagnosis , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[13]  Hsinchun Chen,et al.  Optimal Search-Based Gene Subset Selection for Gene Array Cancer Classification , 2007, IEEE Transactions on Information Technology in Biomedicine.

[14]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Dominik Slezak,et al.  Roughfication of Numeric Decision Tables: The Case Study of Gene Expression Data , 2007, RSKT.

[16]  Julio J. Valdés,et al.  Relevant Attribute Discovery in High Dimensional Data: Application to Breast Cancer Gene Expressions , 2006, RSKT.

[17]  Pradipta Maji,et al.  Rough set based maximum relevance-maximum significance criterion and Gene selection from microarray data , 2011, Int. J. Approx. Reason..

[18]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[19]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[20]  Tommy W. S. Chow,et al.  Effective feature selection scheme using mutual information , 2005, Neurocomputing.

[21]  Qinghua Hu,et al.  Fuzzy Probabilistic Approximation Spaces and Their Information Measures , 2006, IEEE Trans. Fuzzy Syst..

[22]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[23]  Pradipta Maji,et al.  $f$-Information Measures for Efficient Selection of Discriminative Genes From Microarray Data , 2009, IEEE Transactions on Biomedical Engineering.

[24]  C. Wijbrandts,et al.  Rheumatoid arthritis subtypes identified by genomic profiling of peripheral blood cells: assignment of a type I interferon signature in a subpopulation of patients , 2007, Annals of the rheumatic diseases.

[25]  Max A. Viergever,et al.  f-information measures in medical image registration , 2004, IEEE Transactions on Medical Imaging.

[26]  Jung-Hsien Chiang,et al.  A Combination of Rough-Based Feature Selection and RBF Neural Network for Classification Using Gene Expression Data , 2008, IEEE Transactions on NanoBioscience.

[27]  Graziano Pesole,et al.  Regularized Least Squares Cancer Classifiers from DNA microarray data , 2005, BMC Bioinformatics.

[28]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[29]  Malcolm J. Beynon Stability of continuous value discretisation: an application within rough set theory , 2004, Int. J. Approx. Reason..

[30]  Jerzy W. Grzymala-Busse,et al.  Mining of MicroRNA Expression Data - A Rough Set Approach , 2006, RSKT.

[31]  Dominik Slezak Rough Sets and Few-Objects-Many-Attributes Problem: The Case Study of Analysis of Gene Expression Data Sets , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[32]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[33]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.