A Novel Genetic Algorithm for Feature Selection in Hierarchical Feature Spaces

Feature selection methods have been widely adopted to prepare high-dimensional feature spaces for the classification task of data mining. However, in many real-world datasets, the feature space is formed by binary features related via generalization-specialization relationships, also known as hierarchical feature spaces. Although there are many methods for the traditional feature selection problem, methods which properly consider hierarchical features are still very underexplored. In this work, we propose a novel genetic algorithm (GA) for hierarchical feature selection. The proposed GA has two novel hierarchical mutation operators tailored to deal with redundant features in hierarchical feature spaces. The computational experiments show that our proposed approach exhibited better predictive performance than two state-of-the-art hierarchical feature selection methods (SHSEL and HIP) and also than two traditional feature selection methods (ReliefF and CFS).

[1]  de Magalhães,et al.  The Biology of Ageing: A Primer , 2011 .

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Arie Budovsky,et al.  The Human Ageing Genomic Resources: online databases and tools for biogerontologists , 2009, Aging cell.

[4]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[5]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[6]  Alex Alves Freitas,et al.  Prediction of the pro-longevity or anti-longevity effect of Caenorhabditis Elegans genes based on Bayesian classification methods , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[7]  Leslie Pérez Cáceres,et al.  The irace package: Iterated racing for automatic algorithm configuration , 2016 .

[8]  Ye Ye,et al.  Domain ontology-based feature reduction for high dimensional drug data and its application to 30-day heart failure readmission prediction , 2013, 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[9]  Mohammed J. Zaki Data Mining and Analysis: Fundamental Concepts and Algorithms , 2014 .

[10]  Alex Alves Freitas,et al.  An empirical evaluation of hierarchical feature selection methods for classification in bioinformatics datasets with gene ontology-based features , 2018, Artificial Intelligence Review.

[11]  Heiko Paulheim,et al.  Feature Selection in Hierarchical Feature Spaces , 2014, Discovery Science.

[12]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[13]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[14]  Alex Alves Freitas,et al.  Predicting the Pro-Longevity or Anti-Longevity Effect of Model Organism Genes with New Hierarchical Feature Selection Methods , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  João Pedro de Magalhães,et al.  A review of supervised machine learning applied to ageing research , 2017, Biogerontology.

[16]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[17]  Dan Boneh,et al.  On genetic algorithms , 1995, COLT '95.

[18]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[19]  Alex Alves Freitas,et al.  Two methods for constructing a gene ontology-based feature network for a Bayesian network classifier and applications to datasets of aging-related genes , 2015, BCB.

[20]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[21]  Sung-Hyon Myaeng,et al.  Feature Selection Using a Semantic Hierarchy for Event Recognition and Type Classification , 2013, IJCNLP.

[22]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[23]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .