Benchmark for filter methods for feature selection in high-dimensional classification data

Abstract Feature selection is one of the most fundamental problems in machine learning and has drawn increasing attention due to high-dimensional data sets emerging from different fields like bioinformatics. For feature selection, filter methods play an important role, since they can be combined with any machine learning model and can heavily reduce run time of machine learning algorithms. The aim of the analyses is to review how different filter methods work, to compare their performance with respect to both run time and predictive accuracy, and to provide guidance for applications. Based on 16 high-dimensional classification data sets, 22 filter methods are analyzed with respect to run time and accuracy when combined with a classification method. It is concluded that there is no group of filter methods that always outperforms all other methods, but recommendations on filter methods that perform well on many of the data sets are made. Also, groups of filters that are similar with respect to the order in which they rank the features are found. For the analyses, the R machine learning package mlr is used. It provides a uniform programming API and therefore is a convenient tool to conduct feature selection using filter methods.

[1]  Bernd Bischl,et al.  OpenML: An R package to connect to the machine learning platform OpenML , 2017, Comput. Stat..

[2]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[3]  Bernd Bischl,et al.  Resampling Methods for Meta-Model Validation with Recommendations for Evolutionary Computation , 2012, Evolutionary Computation.

[4]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning , 2008 .

[5]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[6]  Li Zhang,et al.  Feature clustering based support vector machine recursive feature elimination for gene selection , 2018, Applied Intelligence.

[7]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[8]  Gérard Biau,et al.  Accelerated gradient boosting , 2018, Machine Learning.

[9]  Verónica Bolón-Canedo,et al.  A review of feature selection methods on synthetic data , 2013, Knowledge and Information Systems.

[10]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.

[12]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[13]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[14]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[15]  Ying Liu,et al.  A Comparative Study on Feature Selection Methods for Drug Discovery , 2004, J. Chem. Inf. Model..

[16]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[17]  Christian H. Bischof,et al.  A Comprehensive Empirical Comparison of Parallel ListSieve and GaussSieve , 2014, Euro-Par Workshops.

[18]  Lawrence D. Fu,et al.  A comprehensive empirical comparison of modern supervised classification and feature selection methods for text categorization , 2014, J. Assoc. Inf. Sci. Technol..

[19]  Michel Lang,et al.  A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data , 2017, Comput. Math. Methods Medicine.

[20]  Dhruba K. Bhattacharyya,et al.  EFS-MI: an ensemble feature selection method for classification , 2017, Complex & Intelligent Systems.

[21]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[23]  Yan Wu,et al.  A New Filter Feature Selection Based on Criteria Fusion for Gene Microarray Data , 2018, IEEE Access.

[24]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[25]  Gavin Brown,et al.  Measuring the Stability of Feature Selection , 2016, ECML/PKDD.

[26]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[27]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[28]  Nikola Bogunovic,et al.  A review of feature selection methods with applications , 2015, 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[29]  Huan Liu,et al.  Feature selection for classification: A review , 2014 .

[30]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[31]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[32]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[33]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[34]  Simon Fong,et al.  Feature selection methods: Case of filter and wrapper approaches for maximising classification accuracy , 2018 .

[35]  Mahdi Eftekhari,et al.  A Hybrid Filter-Based Feature Selection Method via Hesitant Fuzzy and Rough Sets Concepts , 2018, How Fuzzy Concepts Contribute to Machine Learning.

[36]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[37]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[38]  B. Ripley,et al.  Recursive Partitioning and Regression Trees , 2015 .

[39]  Bernd Bischl,et al.  batchtools: Tools for R to work on batch systems , 2017, J. Open Source Softw..

[40]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[41]  Mengjie Zhang,et al.  A Comprehensive Comparison on Evolutionary Feature Selection Approaches to Classification , 2015, Int. J. Comput. Intell. Appl..

[42]  C. D. Jaidhar,et al.  Performance Evaluation of Filter-based Feature Selection Techniques in Classifying Portable Executable Files , 2018 .

[43]  Ram Sarkar,et al.  Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods , 2018, Medical & Biological Engineering & Computing.

[44]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[45]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[46]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[47]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[48]  Shulin Wang,et al.  Feature selection in machine learning: A new perspective , 2018, Neurocomputing.

[49]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[50]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[51]  Bernd Bischl,et al.  mlr: Machine Learning in R , 2016, J. Mach. Learn. Res..

[52]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[53]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  J. Anuradha,et al.  A Review of Feature Selection and Its Methods , 2019, Cybernetics and Information Technologies.

[55]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[56]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[57]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[58]  Colas Schretter,et al.  Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity , 2008, IEEE Journal of Selected Topics in Signal Processing.

[59]  Vili Podgorelec,et al.  Swarm Intelligence Algorithms for Feature Selection: A Review , 2018, Applied Sciences.

[60]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[61]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[62]  Heike Trautmann,et al.  Automated Algorithm Selection on Continuous Black-Box Problems by Combining Exploratory Landscape Analysis and Machine Learning , 2017, Evolutionary Computation.

[63]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[64]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[65]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[66]  Takuya Yanagida,et al.  Statistics in Psychology Using R and SPSS , 2011 .

[67]  Duncan Fyfe Gillies,et al.  A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data , 2015, Adv. Bioinformatics.