论文信息 - Reinforcement Learning with Heterogeneous Policy Representations

Reinforcement Learning with Heterogeneous Policy Representations

In Reinforcement Learning (RL) the goal is to find a policy π that maximizes the expected future return, calculated based on a scalar reward function R(·) ∈ R. The policy π determines what actions will be performed by the RL agent. Traditionally, the RL problem is formulated in terms of a Markov Decision Process (MDP) or a Partially Observable MDP (POMDP). In this formulation, the policy π is viewed as a mapping function (π : s 7−→ a) from state s ∈ S to action a ∈ A. This approach, however, suffers severely from the curse of dimensionality.

Darwin G. Caldwell | Petar Kormushev

[1] Darwin G. Caldwell,et al. Simultaneous discovery of multiple alternative optimal policies by reinforcement learning , 2012, 2012 6th IEEE International Conference Intelligent Systems.

[2] Tom Schaul,et al. Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[3] Andrew G. Barto,et al. Robot Weightlifting By Direct Policy Search , 2001, IJCAI.

[4] Nando de Freitas,et al. Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[5] Darwin G. Caldwell,et al. Direct policy search reinforcement learning based on particle filtering , 2012, EWRL 2012.