A Novel Feature Gene Selection Method Based On Neighborhood Mutual Information

DNA microarray technique can detect tens of thousands of genes activity in cells and has been widely used in clinical diagnosis. However, microarray data has characteristics of high dimension and small samples, moreover many irrelevant and redundant genes also decrease performance of classification algorithm .Mutual information is very effective method and has widely been used in feature gene selection, but it cannot directly deal with continuous features. Therefore, this paper proposes a novel feature gene selection method to resolve this problem. Firstly, a lot of irrelevant genes are eliminated from original data by using reliefF algorithm , and the candidate subset of genes is obtained; Secondly, a algorithm based on neighborhood mutual information and forward greedy search strategy which deals with directly continuous features is proposed to select feature genes in above genes subset. Here, because radius of neighborhood greatly affects reduction performance, differential evolution algorithm is applied to optimize radius before reduction. The simulation results on six benchmark microarray datasets show that our method can obtain higher classification accuracy using as few genes as possible, especially neighborhood mutual information can directly continuous features. Feature genes selected has an important meaning for understanding microarray data and finding pathogenic genes of cancer. It is an effective and efficient method for feature genes selection.

[1]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[2]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[3]  Qinghua Hu,et al.  An efficient gene selection technique for cancer recognition based on neighborhood mutual information , 2010, Int. J. Mach. Learn. Cybern..

[4]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[6]  Xiaoxing Liu,et al.  An Entropy-based gene selection method for cancer classification using microarray data , 2005, BMC Bioinformatics.

[7]  Qiang Shen,et al.  A rough-fuzzy approach for generating classification rules , 2002, Pattern Recognit..

[8]  Tao Chen A selective ensemble classification method on micro array data , 2014 .

[9]  Ivanoe De Falco,et al.  Differential Evolution for automatic rule extraction from medical databases , 2013, Appl. Soft Comput..

[10]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[11]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[12]  Miron B. Kursa,et al.  Robustness of Random Forest-based gene selection methods , 2013, BMC Bioinformatics.

[13]  Nir Friedman,et al.  Tissue classification with gene expression profiles , 2000, RECOMB '00.

[14]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[15]  Wei Jia,et al.  Robust Classification Method of Tumor Subtype by Using Correlation Filters , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[17]  Witold Pedrycz,et al.  Measuring relevance between discrete and continuous features based on neighborhood mutual information , 2011, Expert Syst. Appl..

[18]  Jing Zhang,et al.  From Parzen Window Estimation to Feature Extraction: A New Perspective , 2016, IDEAL.

[19]  Kai Yu,et al.  Feature Selection for Gene Expression Using Model-Based Entropy , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Tao Chen Classification Algorithm on Gene Expression Profiles of Tumor Using Neighborhood Rough Set and Support Vector Machine , 2013 .

[21]  P. N. Suganthan,et al.  Differential Evolution: A Survey of the State-of-the-Art , 2011, IEEE Transactions on Evolutionary Computation.

[22]  Fillia Makedon,et al.  HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data , 2005, Bioinform..

[23]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[24]  Vladimir Pavlovic,et al.  RankGene: identification of diagnostic genes based on expression data , 2003, Bioinform..