Multi-objective optimisation by reinforcement learning

This paper presents a multi-objective optimisation by reinforcement learning, called MORL, to solve complex multi-objective optimisation problems, in particular those in a high-dimensional space. In MORL, the search is undertaken on individual dimension in a high-dimensional space via a path selected by an estimated path value. Path values, estimated by weighting the state values on the selected path, represent the potentiality of finding a better solution if searching on the paths, and are used to memorize the quality of previously visited states. In MORL, visited states are assigned with different immediate rewards by comparing the objective vector of current state with those of the Pareto optimal solutions found previously. These Pareto optimal solutions are stored in an elite list, which keeps track of the non-dominated solutions found so far and is used to construct the Pareto front at the end of the optimisation process. MORL is compared with a promising multi-objective evolutionary algorithm based on decomposition (MOEA/D) on four widely-used benchmark functions. The simulation results have demonstrated that MORL is superior over MOEA/D with respect to the accuracy and the range of the Pareto fronts, especially in solving high-dimensional multi-objective optimisation problems.

[1]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[2]  Qingfu Zhang,et al.  Expensive Multiobjective Optimization by MOEA/D With Gaussian Process Model , 2010, IEEE Transactions on Evolutionary Computation.

[3]  Kalyanmoy Deb,et al.  Multi-objective Genetic Algorithms: Problem Difficulties and Construction of Test Problems , 1999, Evolutionary Computation.

[4]  M Reyes Sierra,et al.  Multi-Objective Particle Swarm Optimizers: A Survey of the State-of-the-Art , 2006 .

[5]  Aharon Ben-Tal,et al.  Characterization of Pareto and Lexicographic Optimal Solutions , 1980 .

[6]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[7]  Doya Kenji,et al.  Multiobjective Reinforcement Learning based on Multiple Value Function , 2006 .

[8]  Q. Henry Wu,et al.  Optimal placement of FACTS devices by a Group Search Optimizer with Multiple Producer , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[9]  Eduardo F. Morales,et al.  A New Distributed Reinforcement Learning Algorithm for Multiple Objective Optimization Problems , 2000, IBERAMIA-SBIA.

[10]  Qingfu Zhang,et al.  MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition , 2007, IEEE Transactions on Evolutionary Computation.

[11]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[12]  C. E. Mariano,et al.  Distributed reinforcement learning for multiple objective optimization problems , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[13]  Ana Sánchez,et al.  RAMS+C informed decision-making with application to multi-objective optimization of technical specifications and maintenance using genetic algorithms , 2005, Reliab. Eng. Syst. Saf..

[14]  Y. Wang,et al.  Seeking the Pareto front for multiobjective spatial optimization problems , 2008, Int. J. Geogr. Inf. Sci..

[15]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[16]  Carlos A. Coello Coello,et al.  Towards a More Efficient Multi-Objective Particle Swarm Optimizer , 2008 .

[17]  Henry Wu,et al.  High-dimensional optimisation by reinforcement learning , 2010 .

[18]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[19]  E. Thorndike “Animal Intelligence” , 1898, Nature.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Q. Henry Wu,et al.  Group Search Optimizer: An Optimization Algorithm Inspired by Animal Searching Behavior , 2009, IEEE Transactions on Evolutionary Computation.

[22]  David A. Van Veldhuizen,et al.  Evolutionary Computation and Convergence to a Pareto Front , 1998 .

[23]  David W. Coit,et al.  Multi-objective optimization using genetic algorithms: A tutorial , 2006, Reliab. Eng. Syst. Saf..

[24]  Carlos M. Fonseca,et al.  Multiobjective genetic algorithms , 1993 .

[25]  W. Marsden I and J , 2012 .

[26]  A. Osyczka,et al.  A new method to solve generalized multicriteria optimization problems using the simple genetic algorithm , 1995 .