Incremental Refinement of Solutions for Dynamic Multi Objective Optimization Problems

MDQL is an algorithm, based on reinforcement learning, for solving multiple objective optimization problems, that has been tested on several applications with promising results. MDQL discretizes the decision variables into a set of states, each associated with actions to move agents to contiguous states. A group of agents explore this state space and are able to find Pareto sets applying a distributed reinforcement learning algorithm. The precision of the Pareto solutions depends on the chosen granularity of the states. A finer granularity on the states creates more precise solutions but at the expense of a larger search space, and consequently the need for more computational resources. In this paper, a very important improvement is introduced into the original MDQL algorithm to incrementally refined the Pareto solutions. The new algorithm, called IMDQL, starts with a coarse granularity to find an initial Pareto set. A vicinity for each of the Pareto solutions in refined and a new Pareto set is founded in this refined state space. This process continues until there is no more improvement within a small threshold value. It is shown that IMDQL not only improves the solutions found by MDQL, but also converges faster. MDQL has also been tested on the solutions of dynamic optimization problems. In this paper, it is also shown that the adaptation capabilities observed in MDQL can be improved with IMDQL. IMDQL was tested on the benchmark problems proposed by Jin. Performance evaluation was made using the Collective Mean Fitness metric proposed by Morrison. IMDQL was compared against an standard evolution strategy with the covariance matrix adaptation (CMA-ES) with very promising results.

[1]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[2]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[3]  Yaochu Jin,et al.  Dynamic Weighted Aggregation for evolutionary multi-objective optimization: why does it work and how? , 2001 .

[4]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[5]  R.W. Morrison,et al.  A test problem generator for non-stationary environments , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[6]  Bernhard Sendhoff,et al.  Constructing Dynamic Optimization Test Problems Using the Multi-objective Optimization Concept , 2004, EvoWorkshops.

[7]  Eduardo F. Morales,et al.  DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning , 2001, ECML.

[8]  Jürgen Branke,et al.  Evolutionary Optimization in Dynamic Environments , 2001, Genetic Algorithms and Evolutionary Computation.

[9]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[10]  Eduardo F. Morales,et al.  A New Approach for the Solution of Multiple Objective Optimization Problems Based on Reinforcement Learning , 2000, MICAI.

[11]  Hans-Paul Schwefel,et al.  Evolution and optimum seeking , 1995, Sixth-generation computer technology series.

[12]  Jürgen Branke,et al.  Proceedings of the Workshop on Evolutionary Algorithms for Dynamic Optimization Problems (EvoDOP-2003) held in conjunction with the Genetic and Evolutionary Computation Conference (GECCO-2003), 12 July 2003, Chicago, USA [online] , 2003 .

[13]  Eduardo F. Morales,et al.  Multi-objective optimization of water-using systems , 2007, Eur. J. Oper. Res..