论文信息 - Automatic design of hyper-heuristic based on reinforcement learning

Automatic design of hyper-heuristic based on reinforcement learning

Abstract Hyper-heuristic is a class of methodologies which automates the process of selecting or generating a set of heuristics to solve various optimization problems. A traditional hyper-heuristic model achieves this through a high-level heuristic that consists of two key components, namely a heuristic selection method and a move acceptance method. The effectiveness of the high-level heuristic is highly problem dependent due to the landscape properties of different problems. Most of the current hyper-heuristic models formulate a high-level heuristic by matching different combinations of components manually. This article proposes a method to automatically design the high-level heuristic of a hyper-heuristic model by utilizing a reinforcement learning technique. More specifically, Q-learning is applied to guide the hyper-heuristic model in selecting the proper components during different stages of the optimization process. The proposed method is evaluated comprehensively using benchmark instances from six problem domains in the Hyper-heuristic Flexible Framework. The experimental results show that the proposed method is comparable with most of the top-performing hyper-heuristic models in the current literature.

Chee Peng Lim | Li-Pei Wong | Shin Siang Choong | C. Lim | L. Wong

[1] Xin Xu,et al. Reinforcement learning algorithms with function approximation: Recent advances and applications , 2014, Inf. Sci..

[2] Matthias Fuchs,et al. High Performance ATP Systems by Combining Several AI Methods , 1997, IJCAI.

[3] Graham Kendall,et al. Automatic Design of a Hyper-Heuristic Framework With Gene Expression Programming for Combinatorial Optimization Problems , 2015, IEEE Transactions on Evolutionary Computation.

[4] Sanja Petrovic,et al. Iterated local search vs. hyper-heuristics: Towards general-purpose search algorithms , 2010, IEEE Congress on Evolutionary Computation.

[5] Ufuk Kula,et al. A reinforcement learning algorithm with fuzzy approximation for semi Markov decision problems , 2015, J. Intell. Fuzzy Syst..

[6] Rym M'Hallah,et al. An iterated local search variable neighborhood descent hybrid heuristic for the total earliness tardiness permutation flow shop , 2014 .

[7] Mohammad A. Jaradat,et al. Reinforcement based mobile robot navigation in dynamic environment , 2011 .

[8] Derong Liu,et al. A Novel Dual Iterative $Q$-Learning Method for Optimal Battery Management in Smart Residential Environments , 2015, IEEE Transactions on Industrial Electronics.

[9] Ritesh M Pabari,et al. Application of face centred central composite design to optimise compression force and tablet diameter for the formulation of mechanically strong and fast disintegrating orodispersible tablets. , 2012, International journal of pharmaceutics.

[10] Chee Peng Lim,et al. An artificial bee colony algorithm with a modified choice function for the Traveling Salesman Problem , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[11] Ender Özcan,et al. Late acceptance-based selection hyper-heuristics for cross-domain heuristic search , 2013, 2013 13th UK Workshop on Computational Intelligence (UKCI).

[12] Shin Ishii,et al. Incremental State Aggregation for Value Function Estimation in Reinforcement Learning , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13] Kyriakos G. Vamvoudakis,et al. Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems , 2015, Autom..

[14] Edmund K. Burke,et al. A greedy gradient-simulated annealing selection hyper-heuristic , 2013, Soft Comput..

[15] Jasmin E.A.,et al. A function approximation approach to Reinforcement Learning for solving unit commitment problem with Photo voltaic sources , 2016, 2016 IEEE International Conference on Power Electronics, Drives and Energy Systems (PEDES).

[16] Edmund K. Burke,et al. A modified choice function hyper-heuristic controlling unary and binary operators , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[17] Frank L. Lewis,et al. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[18] Ed Keedwell,et al. A Sequence-based Selection Hyper-heuristic Utilising a Hidden Markov Model , 2015, GECCO.

[19] Paul Dempster,et al. Two Frameworks for Cross-Domain Heuristic and Parameter Selection Using Harmony Search , 2015, ICHSA.

[20] Peter I. Cowling,et al. Hyperheuristics: Recent Developments , 2008, Adaptive and Multilevel Metaheuristics.

[21] Robert Ivor John,et al. Fuzzy adaptive parameter control of a late acceptance hyper-heuristic , 2014, 2014 14th UK Workshop on Computational Intelligence (UKCI).

[22] Graham Kendall,et al. Population based Monte Carlo tree search hyper-heuristic for combinatorial optimization problems , 2015, Inf. Sci..

[23] G. K. Koulinas,et al. A new tabu search-based hyper-heuristic algorithm for solving construction leveling problems with limited resource availabilities , 2013 .

[24] Kalyanmoy Deb,et al. A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[25] John S. Baras,et al. A learning algorithm for Markov decision processes with adaptive state aggregation , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[26] Edmund K. Burke,et al. An Improved Choice Function Heuristic Selection for Cross Domain Heuristic Search , 2012, PPSN.

[27] Lehel Csató,et al. Novel Feature Selection and Kernel-Based Value Approximation Method for Reinforcement Learning , 2013, ICANN.

[28] Martijn C. Schut,et al. Reinforcement Learning for Online Control of Evolutionary Algorithms , 2006, ESOA.

[29] Patrick De Causmaecker,et al. A new hyper-heuristic implementation in HyFlex: a study on generality , 2011 .

[30] Tim Brys,et al. Designing reusable metaheuristic methods: A semi-automated approach , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[31] Tim Brys,et al. Fair-share ILS: a simple state-of-the-art iterated local search hyperheuristic , 2014, GECCO.

[32] Michel Gendreau,et al. Hyper-heuristics: a survey of the state of the art , 2013, J. Oper. Res. Soc..

[33] Eric V. Denardo,et al. Dynamic Programming: Models and Applications , 2003 .

[34] Sanja Petrovic,et al. HyFlex: A Benchmark Framework for Cross-Domain Heuristic Search , 2011, EvoCOP.

[35] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[36] Luca Di Gaspero,et al. Evaluation of a Family of Reinforcement Learning Cross-Domain Optimization Heuristics , 2012, LION.

[37] Lin Chen,et al. An ant colony optimization-based hyper-heuristic with genetic programming approach for a hybrid flow shop scheduling problem , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[38] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[39] Shahriar Asta. Machine learning for improving heuristic optimisation , 2015 .

[40] Huaguang Zhang,et al. Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[41] Aurora Trinidad Ramirez Pozo,et al. A Multi-armed Bandit Hyper-Heuristic , 2015, 2015 Brazilian Conference on Intelligent Systems (BRACIS).

[42] Sriram Devanathan,et al. Optimizing properties of nanoclay–nitrile rubber (NBR) composites using Face Centred Central Composite Design , 2012 .

[43] Graham Kendall,et al. A Classification of Hyper-heuristic Approaches , 2010 .

[44] Graham Kendall,et al. A Tabu Search hyper-heuristic strategy for t-way test suite generation , 2016, Appl. Soft Comput..

[45] Graham Kendall,et al. A Hyperheuristic Approach to Scheduling a Sales Summit , 2000, PATAT.

[46] Chee Peng Lim,et al. A new Reinforcement Learning-based Memetic Particle Swarm Optimizer , 2016, Appl. Soft Comput..

[47] Graham Kendall,et al. A Dynamic Multiarmed Bandit-Gene Expression Programming Hyper-Heuristic for Combinatorial Optimization Problems , 2015, IEEE Transactions on Cybernetics.

[48] Thomas Stützle,et al. Automatic Algorithm Configuration Based on Local Search , 2007, AAAI.

[49] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[50] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[51] T. Stützle,et al. Iterated Local Search: Framework and Applications , 2018, Handbook of Metaheuristics.

[52] Pratyusha Rakshit,et al. Realization of an Adaptive Memetic Algorithm Using Differential Evolution and Q-Learning: A Case Study in Multirobot Path Planning , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[53] Alexandre Silvestre Ferreira. A cross-domain multi-armed bandit hyper-heuristic , 2016 .

[54] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.

[55] Ender Özcan,et al. A comprehensive analysis of hyper-heuristics , 2008, Intell. Data Anal..