论文信息 - Extending Policy Shaping to Continuous State Spaces (Student Abstract)

Extending Policy Shaping to Continuous State Spaces (Student Abstract)

Policy Shaping (Griffith et al. 2013), is a Human-in-the-loop Reinforcement Learning (HRL) algorithm. We extend this work to continuous states with our algorithm, Deep Policy Shaping (DPS). DPS uses a feedback neural network that learns the optimality of actions from noisy feedback combined with an RL algorithm. In simulation, we find that DPS outperforms or matches baselines averaged over multiple hyperparameter settings and varying feedback correctness.

Andrea Lockerd Thomaz | Taylor A. Kessler Faulkner | Thomas Benjamin Wei

[1] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[2] Peter Stone,et al. Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces , 2017, AAAI.

[3] Peter Stone,et al. Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[4] Andrea Lockerd Thomaz,et al. Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[5] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6] Geoffrey E. Hinton,et al. Learning to Label Aerial Images from Noisy Data , 2012, ICML.

[7] Radha Poovendran,et al. FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback , 2020, AAMAS.

[8] Yuta Tsuboi,et al. DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback , 2018, ArXiv.