Improving Classification Performance by Merging Distinct Feature Sets of Similar Quality Generated by Multiple Initializations of mRMR

The success of machine learning algorithms often depends on the combination of model size, computational cost and interpretability. One way to optimize these properties is feature selection. Computational cost and model size can be reduced by discarding features with low relevance. Furthermore, feature selection can provide a deeper understanding of the feature's importance. This work focuses on the minimal-redundancy-maximal-relevance algorithm (mRMR) which is a filter-method for feature selection that uses pair wise mutual information as a measure to decide which feature is relevant. The algorithm is initialized with the feature with the highest relevance according to the measure and an iterative algorithm selects the next feature which optimizes for a high relevance while maintaining a low redundancy to the previously selected features. This work extensively studies distinct feature sets which can be generated when running the mRMR algorithm multiple times using features of descending relevance as initialization. By exploiting information about the order in which the iterative algorithm chooses the features in the various runs, a strategy is proposed to generate a new combined feature set from all initializations. Applying the proposed strategy to four datasets of different sizes and two classification algorithms shows that the resulting feature sets are significantly better compared to the original mRMR algorithm for the given classification task. The proposed method is well-suited for cases where it is not feasible to use wrapper-methods to increase classification accuracy.

[1]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[2]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[3]  J.C. Rajapakse,et al.  SVM-RFE With MRMR Filter for Gene Selection , 2010, IEEE Transactions on NanoBioscience.

[4]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[5]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[6]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Benjamin Haibe-Kains,et al.  mRMRe: an R package for parallelized mRMR ensemble feature selection , 2013, Bioinform..

[8]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[9]  David W. Aha,et al.  A Comparative Evaluation of Sequential Feature Selection Algorithms , 1995, AISTATS.

[10]  F. Wilcoxon,et al.  A simplified method of evaluating dose-effect experiments. , 1948, The Journal of pharmacology and experimental therapeutics.

[11]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[12]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations , 2011, IEEE Transactions on Information Theory.

[13]  Constantine Kotropoulos,et al.  Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition , 2008, Signal Process..

[14]  Sascha Meudt,et al.  Prosodic, Spectral and Voice Quality Feature Selection Using a Long-Term Stopping Criterion for Audio-Based Emotion Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[15]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[16]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[17]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[18]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[19]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[20]  D. Botstein,et al.  A gene expression database for the molecular pharmacology of cancer , 2000, Nature Genetics.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[23]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[24]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[25]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[26]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.