Evolutionary Machine Learning With Minions: A Case Study in Feature Selection

Many decisions in a machine learning (ML) pipeline involve nondifferentiable and discontinuous objectives and search spaces. Examples include feature selection, model selection, and hyperparameter tuning, where candidate solutions in an outer optimization loop must be evaluated via a learning subsystem. Evolutionary algorithms (EAs) are prominent gradient-free methods to handle such tasks. However, EAs are known to pose steep computational challenges, especially when dealing with large-instance datasets. As opposed to prior works that often fall back on parallel computing hardware to resolve this big data problem of EAs, in this article, we propose a novel algorithm-centric solution based on evolutionary multitasking. Our approach involves the creation of a band of minions, i.e., small data proxies to the main target task, that are constructed by subsampling a fraction of the large dataset. We then combine the minions with the main task in a single multitask optimization framework, boosting evolutionary search by using small data to quickly optimize for the large dataset. Our key algorithmic contribution in this setting is to allocate computational resources to each of the tasks in a principled manner. The article considers wrapper-based feature selection as an illustrative case study of the broader idea of using multitasking to speedup outer loop evolutionary configurations of any ML subsystem. The experiments reveal that multitasking can indeed speedup baseline EAs, by more than 40% on some datasets.

[1]  Zexuan Zhu,et al.  Markov blanket-embedded genetic algorithm for gene selection , 2007, Pattern Recognit..

[2]  Qingfu Zhang,et al.  Bridging machine learning and evolutionary computation , 2014, Neurocomputing.

[3]  Qingfu Zhang,et al.  On the convergence of a class of estimation of distribution algorithms , 2004, IEEE Transactions on Evolutionary Computation.

[4]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[5]  Heinz Mühlenbein,et al.  The Equation for Response to Selection and Its Use for Prediction , 1997, Evolutionary Computation.

[6]  Peter Andreae,et al.  A Novel Binary Particle Swarm Optimization Algorithm and Its Applications on Knapsack and Feature Selection Problems , 2017 .

[7]  Francisco Herrera,et al.  A combined MapReduce-windowing two-level parallel scheme for evolutionary prototype generation , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[8]  Yew-Soon Ong,et al.  Multifactorial Evolution: Toward Evolutionary Multitasking , 2016, IEEE Transactions on Evolutionary Computation.

[9]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[10]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[11]  Fionn Murtagh,et al.  Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? , 2011, Journal of Classification.

[12]  Xiaodong Li,et al.  DG2: A Faster and More Accurate Differential Grouping for Large-Scale Black-Box Optimization , 2017, IEEE Transactions on Evolutionary Computation.

[13]  Hisao Ishibuchi,et al.  Parallel Distributed Hybrid Fuzzy GBML Models With Rule Set Migration and Training Data Rotation , 2013, IEEE Transactions on Fuzzy Systems.

[14]  Yew-Soon Ong,et al.  Back to the Roots: Multi-X Evolutionary Computation , 2019, Cognitive Computation.

[15]  Xiaodong Li,et al.  Enhanced Multifactorial Evolutionary Algorithm With Meme Helper-Tasks , 2021, IEEE Transactions on Cybernetics.

[16]  Broderick Crawford,et al.  Parameter tuning of a choice-function based hyperheuristic using Particle Swarm Optimization , 2013, Expert Syst. Appl..

[17]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[18]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[19]  Alicia Troncoso Lora,et al.  Enhancing the scalability of a genetic algorithm to discover quantitative association rules in large-scale datasets , 2015, Integr. Comput. Aided Eng..

[20]  Mengjie Zhang,et al.  Enhanced feature selection for biomarker discovery in LC-MS data using GP , 2013, 2013 IEEE Congress on Evolutionary Computation.

[21]  Francisco Herrera,et al.  IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule , 2010, Pattern Recognit..

[22]  Yew-Soon Ong,et al.  Evolutionary Multitasking: A Computer Science View of Cognitive Multitasking , 2016, Cognitive Computation.

[23]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[24]  Zhaohui Wu,et al.  A novel ensemble-based wrapper method for feature selection using extreme learning machine and genetic algorithm , 2018, Knowledge and Information Systems.

[25]  Hisao Ishibuchi,et al.  Training Data Subdivision and Periodical Rotation in Hybrid Fuzzy Genetics-Based Machine Learning , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[26]  Michèle Sebag,et al.  Collaborative hyperparameter tuning , 2013, ICML.

[27]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[28]  Jaume Bacardit,et al.  Speeding up the evaluation of evolutionary learning systems using GPGPUs , 2010, GECCO '10.

[29]  Abhishek Gupta,et al.  Multifactorial Evolutionary Algorithm With Online Transfer Parameter Estimation: MFEA-II , 2020, IEEE Transactions on Evolutionary Computation.

[30]  Yanqing Zhang,et al.  A genetic algorithm-based method for feature subset selection , 2008, Soft Comput..

[31]  Yew-Soon Ong,et al.  Memetic Computation , 2019, Adaptation, Learning, and Optimization.

[32]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[33]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[34]  Malcolm I. Heywood,et al.  Training genetic programming on half a million patterns: an example from anomaly detection , 2005, IEEE Transactions on Evolutionary Computation.

[35]  Bing Xue,et al.  PSO with surrogate models for feature selection: static and dynamic clustering-based methods , 2018, Memetic Comput..

[36]  Xavier Llorà,et al.  Scaling Genetic Algorithms Using MapReduce , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[37]  Bing Xue,et al.  A Novel Genetic Algorithm Approach to Simultaneous Feature Selection and Instance Selection , 2020, 2020 IEEE Symposium Series on Computational Intelligence (SSCI).

[38]  Liang Feng,et al.  Insights on Transfer Optimization: Because Experience is the Best Teacher , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.

[39]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.