Efficient Randomized Feature Selection Algorithms

Feature selection is a core problem in machine learning. It plays an important role in making efficient and explainable machine-driven decisions. Embedded feature selection methods, such as decision trees and LASSO, suffer from learner dependency and cannot be applied well to many popular learners. Wrapper methods, which fit arbitrary learning models, are receiving growing interests in many scientific fields. In order to effectively search relevant features in wrapper methods, many randomized schemes have been proposed. In this paper, we present efficient randomized feature selection algorithms empowered by automatic breadth searching and attention searching adjustments. Our schemes are generic and highly parallelizable in nature and can be easily applied to many related algorithms. Theoretical analysis proves the efficiency of our algorithms. Extensive experiments on synthetic and real dataset show that our techniques achieve significant improvements in the selected features' quality and selection time.

[1]  David Casasent,et al.  An improvement on floating search algorithms for feature subset selection , 2009, Pattern Recognit..

[2]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[3]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[4]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[5]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[6]  Anders Krogh,et al.  Fast and sensitive taxonomic classification for metagenomics with Kaiju , 2016, Nature Communications.

[7]  Sanguthevar Rajasekaran,et al.  Novel Speedup Techniques for Parallel Singular Value Decomposition , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[10]  Sanguthevar Rajasekaran,et al.  On Simulated Annealing and Nested Annealing , 2000, J. Glob. Optim..

[11]  Sanguthevar Rajasekaran,et al.  Derivation of Randomized Sorting and Selection Algorithms , 1993 .

[12]  Cheng Li,et al.  Rapid Bayesian optimisation for synthesis of short polymer fiber materials , 2017, Scientific Reports.

[13]  Luigi Piroddi,et al.  A Feature Selection and Classification Algorithm Based on Randomized Extraction of Model Populations , 2018, IEEE Transactions on Cybernetics.

[14]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[15]  Justin Doak,et al.  An evaluation of feature selection methods and their application to computer security , 1992 .

[16]  Steve R. Gunn,et al.  Design and Analysis of the NIPS2003 Challenge , 2006, Feature Extraction.

[17]  Paul E. Utgoff,et al.  Randomized Variable Elimination , 2002, J. Mach. Learn. Res..

[18]  S. Lonardi,et al.  CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers , 2015, BMC Genomics.

[19]  David J. Stracuzzi Randomized Feature Selection , 2007 .

[20]  Jon Atli Benediktsson,et al.  Feature Selection Based on Hybridization of Genetic Algorithm and Particle Swarm Optimization , 2015, IEEE Geoscience and Remote Sensing Letters.

[21]  Sanguthevar Rajasekaran,et al.  Novel Randomized Feature Selection Algorithms , 2015, Int. J. Found. Comput. Sci..

[22]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[23]  Anand Chandrasekaran,et al.  Polymer Genome: A Data-Powered Polymer Informatics Platform for Property Predictions , 2018, The Journal of Physical Chemistry C.