论文信息 - Robust Reinforcement Learning in Motion Planning

Robust Reinforcement Learning in Motion Planning

While exploring to find better solutions, an agent performing online reinforcement learning (RL) can perform worse than is acceptable. In some cases, exploration might have unsafe, or even catastrophic, results, often modeled in terms of reaching 'failure' states of the agent's environment. This paper presents a method that uses domain knowledge to reduce the number of failures during exploration. This method formulates the set of actions from which the RL agent composes a control policy to ensure that exploration is conducted in a policy space that excludes most of the unacceptable policies. The resulting action set has a more abstract relationship to the task being solved than is common in many applications of RL. Although the cost of this added safety is that learning may result in a suboptimal solution, we argue that this is an appropriate tradeoff in many problems. We illustrate this method in the domain of motion planning.

[1] Satinder Singh. Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[2] R.J. Williams,et al. Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.

[3] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[4] C. Atkeson,et al. Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .

[5] Roderic A. Grupen,et al. The applications of harmonic functions to robotics , 1993, J. Field Robotics.

[6] Satinder Singh,et al. Learning to Solve Markovian Decision Processes , 1993 .

[7] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[8] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..