Automatic design of hyper-heuristic based on reinforcement learning

Abstract Hyper-heuristic is a class of methodologies which automates the process of selecting or generating a set of heuristics to solve various optimization problems. A traditional hyper-heuristic model achieves this through a high-level heuristic that consists of two key components, namely a heuristic selection method and a move acceptance method. The effectiveness of the high-level heuristic is highly problem dependent due to the landscape properties of different problems. Most of the current hyper-heuristic models formulate a high-level heuristic by matching different combinations of components manually. This article proposes a method to automatically design the high-level heuristic of a hyper-heuristic model by utilizing a reinforcement learning technique. More specifically, Q-learning is applied to guide the hyper-heuristic model in selecting the proper components during different stages of the optimization process. The proposed method is evaluated comprehensively using benchmark instances from six problem domains in the Hyper-heuristic Flexible Framework. The experimental results show that the proposed method is comparable with most of the top-performing hyper-heuristic models in the current literature.

[1]  Xin Xu,et al.  Reinforcement learning algorithms with function approximation: Recent advances and applications , 2014, Inf. Sci..

[2]  Matthias Fuchs,et al.  High Performance ATP Systems by Combining Several AI Methods , 1997, IJCAI.

[3]  Graham Kendall,et al.  Automatic Design of a Hyper-Heuristic Framework With Gene Expression Programming for Combinatorial Optimization Problems , 2015, IEEE Transactions on Evolutionary Computation.

[4]  Sanja Petrovic,et al.  Iterated local search vs. hyper-heuristics: Towards general-purpose search algorithms , 2010, IEEE Congress on Evolutionary Computation.

[5]  Ufuk Kula,et al.  A reinforcement learning algorithm with fuzzy approximation for semi Markov decision problems , 2015, J. Intell. Fuzzy Syst..

[6]  Rym M'Hallah,et al.  An iterated local search variable neighborhood descent hybrid heuristic for the total earliness tardiness permutation flow shop , 2014 .

[7]  Mohammad A. Jaradat,et al.  Reinforcement based mobile robot navigation in dynamic environment , 2011 .

[8]  Derong Liu,et al.  A Novel Dual Iterative $Q$-Learning Method for Optimal Battery Management in Smart Residential Environments , 2015, IEEE Transactions on Industrial Electronics.

[9]  Ritesh M Pabari,et al.  Application of face centred central composite design to optimise compression force and tablet diameter for the formulation of mechanically strong and fast disintegrating orodispersible tablets. , 2012, International journal of pharmaceutics.

[10]  Chee Peng Lim,et al.  An artificial bee colony algorithm with a modified choice function for the Traveling Salesman Problem , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[11]  Ender Özcan,et al.  Late acceptance-based selection hyper-heuristics for cross-domain heuristic search , 2013, 2013 13th UK Workshop on Computational Intelligence (UKCI).

[12]  Shin Ishii,et al.  Incremental State Aggregation for Value Function Estimation in Reinforcement Learning , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Kyriakos G. Vamvoudakis,et al.  Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems , 2015, Autom..

[14]  Edmund K. Burke,et al.  A greedy gradient-simulated annealing selection hyper-heuristic , 2013, Soft Comput..

[15]  Jasmin E.A.,et al.  A function approximation approach to Reinforcement Learning for solving unit commitment problem with Photo voltaic sources , 2016, 2016 IEEE International Conference on Power Electronics, Drives and Energy Systems (PEDES).

[16]  Edmund K. Burke,et al.  A modified choice function hyper-heuristic controlling unary and binary operators , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[17]  Frank L. Lewis,et al.  Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[18]  Ed Keedwell,et al.  A Sequence-based Selection Hyper-heuristic Utilising a Hidden Markov Model , 2015, GECCO.

[19]  Paul Dempster,et al.  Two Frameworks for Cross-Domain Heuristic and Parameter Selection Using Harmony Search , 2015, ICHSA.

[20]  Peter I. Cowling,et al.  Hyperheuristics: Recent Developments , 2008, Adaptive and Multilevel Metaheuristics.

[21]  Robert Ivor John,et al.  Fuzzy adaptive parameter control of a late acceptance hyper-heuristic , 2014, 2014 14th UK Workshop on Computational Intelligence (UKCI).

[22]  Graham Kendall,et al.  Population based Monte Carlo tree search hyper-heuristic for combinatorial optimization problems , 2015, Inf. Sci..

[23]  G. K. Koulinas,et al.  A new tabu search-based hyper-heuristic algorithm for solving construction leveling problems with limited resource availabilities , 2013 .

[24]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[25]  John S. Baras,et al.  A learning algorithm for Markov decision processes with adaptive state aggregation , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[26]  Edmund K. Burke,et al.  An Improved Choice Function Heuristic Selection for Cross Domain Heuristic Search , 2012, PPSN.

[27]  Lehel Csató,et al.  Novel Feature Selection and Kernel-Based Value Approximation Method for Reinforcement Learning , 2013, ICANN.

[28]  Martijn C. Schut,et al.  Reinforcement Learning for Online Control of Evolutionary Algorithms , 2006, ESOA.

[29]  Patrick De Causmaecker,et al.  A new hyper-heuristic implementation in HyFlex: a study on generality , 2011 .

[30]  Tim Brys,et al.  Designing reusable metaheuristic methods: A semi-automated approach , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[31]  Tim Brys,et al.  Fair-share ILS: a simple state-of-the-art iterated local search hyperheuristic , 2014, GECCO.

[32]  Michel Gendreau,et al.  Hyper-heuristics: a survey of the state of the art , 2013, J. Oper. Res. Soc..

[33]  Eric V. Denardo,et al.  Dynamic Programming: Models and Applications , 2003 .

[34]  Sanja Petrovic,et al.  HyFlex: A Benchmark Framework for Cross-Domain Heuristic Search , 2011, EvoCOP.

[35]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[36]  Luca Di Gaspero,et al.  Evaluation of a Family of Reinforcement Learning Cross-Domain Optimization Heuristics , 2012, LION.

[37]  Lin Chen,et al.  An ant colony optimization-based hyper-heuristic with genetic programming approach for a hybrid flow shop scheduling problem , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[38]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[39]  Shahriar Asta Machine learning for improving heuristic optimisation , 2015 .

[40]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[41]  Aurora Trinidad Ramirez Pozo,et al.  A Multi-armed Bandit Hyper-Heuristic , 2015, 2015 Brazilian Conference on Intelligent Systems (BRACIS).

[42]  Sriram Devanathan,et al.  Optimizing properties of nanoclay–nitrile rubber (NBR) composites using Face Centred Central Composite Design , 2012 .

[43]  Graham Kendall,et al.  A Classification of Hyper-heuristic Approaches , 2010 .

[44]  Graham Kendall,et al.  A Tabu Search hyper-heuristic strategy for t-way test suite generation , 2016, Appl. Soft Comput..

[45]  Graham Kendall,et al.  A Hyperheuristic Approach to Scheduling a Sales Summit , 2000, PATAT.

[46]  Chee Peng Lim,et al.  A new Reinforcement Learning-based Memetic Particle Swarm Optimizer , 2016, Appl. Soft Comput..

[47]  Graham Kendall,et al.  A Dynamic Multiarmed Bandit-Gene Expression Programming Hyper-Heuristic for Combinatorial Optimization Problems , 2015, IEEE Transactions on Cybernetics.

[48]  Thomas Stützle,et al.  Automatic Algorithm Configuration Based on Local Search , 2007, AAAI.

[49]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[50]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[51]  T. Stützle,et al.  Iterated Local Search: Framework and Applications , 2018, Handbook of Metaheuristics.

[52]  Pratyusha Rakshit,et al.  Realization of an Adaptive Memetic Algorithm Using Differential Evolution and Q-Learning: A Case Study in Multirobot Path Planning , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[53]  Alexandre Silvestre Ferreira A cross-domain multi-armed bandit hyper-heuristic , 2016 .

[54]  Shigenobu Kobayashi,et al.  Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.

[55]  Ender Özcan,et al.  A comprehensive analysis of hyper-heuristics , 2008, Intell. Data Anal..