Selecting Efficient Features via a Hyper-Heuristic Approach

By Emerging huge databases and the need to efficient learning algorithms on these datasets, new problems have appeared and some methods have been proposed to solve these problems by selecting efficient features. Feature selection is a problem of finding efficient features among all features in which the final feature set can improve accuracy and reduce complexity. One way to solve this problem is to evaluate all possible feature subsets. However, evaluating all possible feature subsets is an exhaustive search and thus it has high computational complexity. Until now many heuristic algorithms have been studied for solving this problem. Hyper-heuristic is a new heuristic approach which can search the solution space effectively by applying local searches appropriately. Each local search is a neighborhood searching algorithm. Since each region of the solution space can have its own characteristics, it should be chosen an appropriate local search and apply it to current solution. This task is tackled to a supervisor. The supervisor chooses a local search based on the functional history of local searches. By doing this task, it can trade of between exploitation and exploration. Since the existing heuristic cannot trade of between exploration and exploitation appropriately, the solution space has not been searched appropriately in these methods and thus they have low convergence rate. For the first time, in this paper use a hyper-heuristic approach to find an efficient feature subset. In the proposed method, genetic algorithm is used as a supervisor and 66 heuristic algorithms are used as local searches. Empirical study of the proposed method on several commonly used data sets from UCI data sets indicates that it outperforms recent existing methods in the literature for feature selection.

[1]  Hossein Nezamabadi-pour,et al.  Cooperating of Local Searches based Hyperheuristic Approach for Solving Traveling Salesman Problem , 2011, IJCCI.

[2]  Francisco Herrera,et al.  A memetic algorithm for evolutionary prototype selection: A scaling up approach , 2008, Pattern Recognit..

[3]  Muhammad Atif Tahir,et al.  Creating diverse nearest-neighbour ensembles using simultaneous metaheuristic feature selection , 2010, Pattern Recognit. Lett..

[4]  Silvia Casado Yusta,et al.  Different metaheuristic strategies to solve the feature selection problem , 2009, Pattern Recognit. Lett..

[5]  Sreeram Ramakrishnan,et al.  A hybrid approach for feature subset selection using neural networks and ant colony optimization , 2007, Expert Syst. Appl..

[6]  E. Ghiselli Theory of psychological measurement , 1964 .

[7]  S. Baskar,et al.  A novel information theoretic-interact algorithm (IT-IN) for feature selection using three machine learning algorithms , 2010, Expert Syst. Appl..

[8]  Pierre Hansen,et al.  Variable Neighbourhood Search , 2003 .

[9]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[10]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[11]  Duoqian Miao,et al.  A rough set approach to feature selection based on ant colony optimization , 2010, Pattern Recognit. Lett..

[12]  Graham Kendall,et al.  An adaptive Length chromosome Hyper-Heuristic Genetic Algorithm for a Trainer Scheduling Problem , 2002, SEAL.

[13]  Héctor Pomares,et al.  Parallel multiobjective memetic RBFNNs design and feature selection for function approximation problems , 2009, Neurocomputing.

[14]  Graham Kendall,et al.  An Investigation of a Tabu-Search-Based Hyper-Heuristic for Examination Timetabling , 2005 .

[15]  Lei Liu,et al.  Feature selection with dynamic mutual information , 2009, Pattern Recognit..

[16]  Graham Kendall,et al.  An investigation of a hyperheuristic genetic algorithm applied to a trainer scheduling problem , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[17]  Graham Kendall,et al.  An Investigation of Automated Planograms Using a Simulated Annealing Based Hyper-Heuristic , 2005 .

[18]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Rami N. Khushaba,et al.  Enhancing the diversity of genetic algorithm for improved feature selection , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[20]  Edmund K. Burke,et al.  A Simulated Annealing Hyper-heuristic for University Course Timetabling , 2006 .

[21]  Sanja Petrovic,et al.  Case-based heuristic selection for timetabling problems , 2006, J. Sched..

[22]  Lawrence Davis,et al.  Bit-Climbing, Representational Bias, and Test Suite Design , 1991, ICGA.

[23]  Zhen Ji,et al.  Towards a Memetic Feature Selection Paradigm [Application Notes] , 2010, IEEE Computational Intelligence Magazine.

[24]  Andy J. Keane,et al.  Meta-Lamarckian learning in memetic algorithms , 2004, IEEE Transactions on Evolutionary Computation.

[25]  Daoliang Li,et al.  An improved genetic algorithm for optimal feature subset selection from multi-character feature set , 2011, Expert Syst. Appl..

[26]  Dunja Mladenic,et al.  Feature Selection for Unbalanced Class Distribution and Naive Bayes , 1999, ICML.

[27]  Ross O'Brien,et al.  Ant algorithm hyperheuristic approaches for scheduling problems , 2008 .

[28]  Daoqiang Zhang,et al.  Bagging Constraint Score for feature selection with pairwise constraints , 2010, Pattern Recognit..

[29]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[30]  Graham Kendall,et al.  A Hyperheuristic Approach to Scheduling a Sales Summit , 2000, PATAT.

[31]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[32]  Hongbin Zhang,et al.  Feature selection using tabu search method , 2002, Pattern Recognit..

[33]  Sung-Bae Cho,et al.  Efficient huge-scale feature selection with speciated genetic algorithm , 2005 .

[34]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[35]  Li-Yeh Chuang,et al.  Improved binary PSO for feature selection using gene expression data , 2008, Comput. Biol. Chem..