Analysis of Q-learning with random exploration for selection of auxiliary objectives in random local search

We perform theoretical analysis for a previously proposed method of enhancing performance of an evolutionary algorithm with reinforcement learning. The method adaptively chooses between auxiliary objectives in a single-objective evolutionary algorithm using reinforcement learning. We consider the Q-learning algorithm with ε-greedy strategy (ε > 0), using a benchmark problem based on ONEMAX. For the evolutionary algorithm, we consider the Random Local Search. In our setting, ONEMAX problem should be solved in the presence of the obstructive ZEROMAX objective. This benchmark tests the ability of the reinforcement learning algorithm to ignore such an inefficient objective. It was previously shown that in the case of the greedy strategy (ε = 0), the considered algorithm performs on the described benchmark problem in the best possible time for a conventional evolutionary algorithm. However, the ε-greedy strategy appears to perform in exponential time. Furthermore, every selection algorithm which selects an inefficient auxiliary objective with probability of at least δ is shown to be asymptotically inefficient when δ > 0 is a constant.

[1]  Arina Buzdalova,et al.  Adaptive selection of helper-objectives for test case generation , 2013, 2013 IEEE Congress on Evolutionary Computation.

[2]  Ingo Wegener,et al.  Can Single-Objective Optimization Profit from Multiobjective Optimization? , 2008, Multiobjective Problem Solving from Nature.

[3]  Matthew E. Taylor,et al.  Multi-objectivization of reinforcement learning problems by reward shaping , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[4]  Bilel Derbel,et al.  Multiobjectivization with NSGA-ii on the noiseless BBOB testbed , 2013, GECCO.

[5]  Arina Buzdalova,et al.  Increasing Efficiency of Evolutionary Algorithms by Choosing between Auxiliary Fitness Functions with Reinforcement Learning , 2012, 2012 11th International Conference on Machine Learning and Applications.

[6]  Frank Neumann,et al.  On the Effects of Adding Objectives to Plateau Functions , 2009, IEEE Transactions on Evolutionary Computation.

[7]  Frank W. Ciarallo,et al.  Helper-objective optimization strategies for the Job-Shop Scheduling Problem , 2011, Appl. Soft Comput..

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Richard A. Watson,et al.  Reducing Local Optima in Single-Objective Problems by Multi-objectivization , 2001, EMO.

[10]  Arina Buzdalova,et al.  A First Step towards the Runtime Analysis of Evolutionary Algorithm Adjusted with Reinforcement Learning , 2013, 2013 12th International Conference on Machine Learning and Applications.

[11]  Gara Miranda,et al.  Using multi-objective evolutionary algorithms for single-objective optimization , 2013, 4OR.

[12]  Arina Buzdalova,et al.  Onemax helps optimizing XdivK:: theoretical runtime analysis for RLS and EA+RL , 2014, GECCO.

[13]  Frank W. Ciarallo,et al.  Deterministic helper-objective sequences applied to job-shop scheduling , 2010, GECCO '10.

[14]  Joshua D. Knowles,et al.  Multiobjectivization by Decomposition of Scalar Cost Functions , 2008, PPSN.

[15]  Mikkel T. Jensen,et al.  Helper-objectives: Using multi-objective evolutionary algorithms for single-objective optimisation , 2004, J. Math. Model. Algorithms.

[16]  Frank W. Ciarallo,et al.  Multiobjectivization via Helper-Objectives With the Tunable Objectives Problem , 2012, IEEE Transactions on Evolutionary Computation.

[17]  David Greiner,et al.  Improving Computational Mechanics Optimum Design Using Helper Objectives: An Application in Frame Bar Structures , 2007, EMO.

[18]  Sushil J. Louis,et al.  Pareto OptimalityGA-Easiness and Deception (Extended Abstract) , 1993, International Conference on Genetic Algorithms.

[19]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[20]  Thomas Jansen,et al.  Fixed budget computations: a different perspective on run time analysis , 2012, GECCO '12.