Improved Measures of Redundancy and Relevance for mRMR Feature Selection

Many biological or medical data have numerous features. Feature selection is one of the data preprocessing steps that can remove the noise from data as well as save the computing time when the dataset has several hundred thousand or more features. Another goal of feature selection is improving the classification accuracy in machine learning tasks. Minimum Redundancy Maximum Relevance (mRMR) is a well-known feature selection algorithm that selects features by calculating redundancy between features and relevance between features and class vector. mRMR adopts mutual information theory to measure redundancy and relevance. In this research, we propose a method to improve the performance of mRMR feature selection. We apply Pearson’s correlation coefficient as a measure of redundancy and R-value as a measure of relevance. To compare original mRMR and the proposed method, features were selected using both of two methods from various datasets, and then we performed a classification test. The classification accuracy was used as a measure of performance comparison. In many cases, the proposed method showed higher accuracy than original mRMR.

[1]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[2]  Sejong Oh A new dataset evaluation method based on category overlap , 2011, Comput. Biol. Medicine.

[3]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[4]  Ximing Cai,et al.  Input variable selection for water resources systems using a modified minimum redundancy maximum relevance (mMRMR) algorithm , 2009 .

[5]  Cheng Liang,et al.  Mirsynergy: detecting synergistic miRNA regulatory modules by overlapping neighbourhood expansion , 2014, Bioinform..

[6]  Bharti,et al.  A combination of dual-tree discrete wavelet transform and minimum redundancy maximum relevance method for diagnosis of Alzheimer's disease , 2015, Int. J. Bioinform. Res. Appl..

[7]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[8]  Juan-Zi Li,et al.  A multi-objective evolutionary algorithm for feature selection based on mutual information with a new redundancy measure , 2015, Inf. Sci..

[9]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[10]  Sejong Oh,et al.  RFS: Efficient feature selection method based on R-value , 2013, Comput. Biol. Medicine.

[11]  Habibollah Haron,et al.  Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Yi Yang,et al.  Semisupervised Feature Selection via Spline Regression for Video Semantic Recognition , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[13]  J.C. Rajapakse,et al.  SVM-RFE With MRMR Filter for Gene Selection , 2010, IEEE Transactions on NanoBioscience.

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..