Evolution of reward functions for reinforcement learning

The reward functions that drive reinforcement learning systems are generally derived directly from the descriptions of the problems that the systems are being used to solve. In some problem domains, however, alternative reward functions may allow systems to learn more quickly or more effectively. Here we describe work on the use of genetic programming to find novel reward functions that improve learning system performance. We briefly present the core concepts of our approach, our motivations in developing it, and reasons to believe that the approach has promise for the production of highly successful adaptive technologies. Experimental results are presented and analyzed in our full report [3].

[1]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[4]  Lee Spector,et al.  Genetic Programming and Autoconstructive Evolution with the Push Programming Language , 2002, Genetic Programming and Evolvable Machines.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Maarten Keijzer,et al.  The Push3 execution stack and the evolution of control , 2005, GECCO '05.

[7]  Richard L. Lewis,et al.  Where Do Rewards Come From , 2009 .

[8]  Lee Spector,et al.  Genetic Programming for Reward Function Search , 2010, IEEE Transactions on Autonomous Mental Development.

[9]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[10]  Leonardo Vanneschi,et al.  Guest editorial: special issue on parallel and distributed evolutionary algorithms, part two , 2010, Genetic Programming and Evolvable Machines.