Feature selection using mutual information based uncertainty measures for tumor classification.

Feature selection is a key problem in tumor classification and related tasks. This paper presents a tumor classification approach with neighborhood rough set-based feature selection. First, some uncertainty measures such as neighborhood entropy, conditional neighborhood entropy, neighborhood mutual information and neighborhood conditional mutual information, are introduced to evaluate the relevance between genes and related decision in neighborhood rough set. Then some important properties and propositions of these measures are investigated, and the relationships among these measures are established as well. By using improved minimal-Redundancy-Maximal-Relevancy, combined with sequential forward greedy search strategy, a novel feature selection algorithm with low time complexity is proposed. Finally, several cancer classification tasks are demonstrated using the proposed approach. Experimental results show that the proposed algorithm is efficient and effective.

[1]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[2]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[3]  Jian-Bo Yang,et al.  An Effective Feature Selection Method via Mutual Information Estimation , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Qinghua Hu,et al.  An efficient gene selection technique for cancer recognition based on neighborhood mutual information , 2010, Int. J. Mach. Learn. Cybern..

[5]  Xiaoming Xu,et al.  A hybrid genetic algorithm for feature selection wrapper based on mutual information , 2007, Pattern Recognit. Lett..

[6]  Sung-Nien Yu,et al.  Conditional mutual information-based feature selection for congestive heart failure recognition using heart rate variability , 2012, Comput. Methods Programs Biomed..

[7]  Lin Sun,et al.  Knowledge Entropy and Feature Selection in Incomplete Decision Systems , 2013 .

[8]  C. Ding,et al.  Gene selection algorithm by combining reliefF and mRMR , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[9]  Jie Gui,et al.  Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction , 2010, Comput. Biol. Medicine.

[10]  Lin Sun,et al.  Feature selection using rough entropy-based uncertainty measures in incomplete decision systems , 2012, Knowl. Based Syst..

[11]  De-Shuang Huang,et al.  Regulation probability method for gene selection , 2006, Pattern Recognit. Lett..

[12]  Feifei Xu,et al.  Fuzzy-rough attribute reduction via mutual information with an application to cancer classification , 2009, Comput. Math. Appl..

[13]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[14]  Yishi Zhang,et al.  Feature subset selection with cumulate conditional mutual information minimization , 2012, Expert Syst. Appl..

[15]  Baldomero Oliva,et al.  Predicting cancer involvement of genes from heterogeneous data , 2008, BMC Bioinformatics.

[16]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..