Fast‐mRMR: Fast Minimum Redundancy Maximum Relevance Algorithm for High‐Dimensional Big Data

With the advent of large‐scale problems, feature selection has become a fundamental preprocessing step to reduce input dimensionality. The minimum‐redundancy‐maximum‐relevance (mRMR) selector is considered one of the most relevant methods for dimensionality reduction due to its high accuracy. However, it is a computationally expensive technique, sharply affected by the number of features. This paper presents fast‐mRMR, an extension of mRMR, which tries to overcome this computational burden. Associated with fast‐mRMR, we include a package with three implementations of this algorithm in several platforms, namely, CPU for sequential execution, GPU (graphics processing units) for parallel computing, and Apache Spark for distributed computing using big data technologies.

[1]  Mihai Datcu,et al.  Interactive Spectral Band Discovery for Exploratory Visual Analysis of Satellite Images , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[2]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[3]  Kurt Keutzer,et al.  Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[4]  Verónica Bolón-Canedo,et al.  Scalability Analysis of mRMR for Microarray Data , 2014, ICAART.

[5]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Michael G. Pecht,et al.  Health Monitoring of Cooling Fans Based on Mahalanobis Distance With mRMR Feature Selection , 2012, IEEE Transactions on Instrumentation and Measurement.

[8]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[9]  Claudio A. Perez,et al.  Gender Classification Based on Fusion of Different Spatial Scale Features Selected by Mutual Information From Histogram of LBP, Intensity, and Shape , 2013, IEEE Transactions on Information Forensics and Security.

[10]  Diansheng Guo,et al.  Coordinating Computational and Visual Approaches for Interactive Feature Selection and Multivariate Clustering , 2003, Inf. Vis..

[11]  James Blustein,et al.  Interactive feature selection for document clustering , 2011, SAC.

[12]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Rüdiger Westermann,et al.  Linear algebra operators for GPU implementation of numerical algorithms , 2003, SIGGRAPH Courses.

[14]  Y. Danieli Guide , 2005 .

[15]  Francisco Herrera,et al.  On the use of MapReduce for imbalanced big data using Random Forest , 2014, Inf. Sci..

[16]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[17]  Ivor W. Tsang,et al.  The Emerging "Big Dimensionality" , 2014, IEEE Computational Intelligence Magazine.

[18]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[19]  Gerhard Tröster,et al.  Eye Movement Analysis for Activity Recognition Using Electrooculography , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.