Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning

Learning in a multi-target environment without prior knowledge about the targets requires a large amount of samples and makes generalization difficult. To solve this problem, it is important to be able to discriminate targets through semantic understanding. In this paper, we propose goal-aware cross-entropy (GACE) loss, that can be utilized in a self-supervised way using auto-labeled goal states alongside reinforcement learning. Based on the loss, we then devise goal-discriminative attention networks (GDAN) which utilize the goal-relevant information to focus on the given instruction. We evaluate the proposed methods on visual navigation and robot arm manipulation tasks with multi-target environments and show that GDAN outperforms the state-of-the-art methods in terms of task success ratio, sample efficiency, and generalization. Additionally, qualitative analyses demonstrate that our proposed method can help the agent become aware of and focus on the given instruction clearly, promoting goal-directed behavior.

[1]  Yuandong Tian,et al.  Building Generalizable Agents with a Realistic and Rich 3D Environment , 2018, ICLR.

[2]  Pierre-Yves Oudeyer,et al.  CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning , 2018, ICML.

[3]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[4]  Paolo Valigi,et al.  Towards Generalization in Target-Driven Visual Navigation by Using Deep Reinforcement Learning , 2020, IEEE Transactions on Robotics.

[5]  Ruslan Salakhutdinov,et al.  Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.

[6]  Ming Zhang,et al.  A2C: Attention-Augmented Contrastive Learning for State Representation Extraction , 2020 .

[7]  Jason J. Corso,et al.  Learning Goal-Conditioned Value Functions with one-step Path rewards rather than Goal-Rewards , 2018 .

[8]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[9]  Pieter Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[10]  Byoung-Tak Zhang,et al.  Multi-focus Attention Network for Efficient Deep Reinforcement Learning , 2017, AAAI Workshops.

[11]  S. Savarese,et al.  Goal-Aware Prediction: Learning to Model What Matters , 2020, ICML.

[12]  Sam Devlin,et al.  MazeExplorer: A Customisable 3D Benchmark for Assessing Generalisation in Reinforcement Learning , 2019, 2019 IEEE Conference on Games (CoG).

[13]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[16]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[17]  Vitaly Levdik,et al.  Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks , 2020, AAAI.

[18]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[19]  Xin Ye,et al.  GAPLE: Generalizable Approaching Policy LEarning for Robotic Object Searching in Indoor Environment , 2018, IEEE Robotics and Automation Letters.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Henry Zhu,et al.  Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[22]  Regina Barzilay,et al.  Grounding Language for Transfer in Deep Reinforcement Learning , 2017, J. Artif. Intell. Res..

[23]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[24]  Roozbeh Mottaghi,et al.  ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jana Kosecka,et al.  Visual Representations for Semantic Target Driven Navigation , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[26]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[27]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[31]  Xuewen Chen,et al.  An Actor-Critic-Attention Mechanism for Deep Reinforcement Learning in Multi-view Environments , 2019, IJCAI.

[32]  Alex Mott,et al.  Towards Interpretable Reinforcement Learning Using Attention Augmented Agents , 2019, NeurIPS.

[33]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[35]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[36]  Joelle Pineau,et al.  Improving Sample Efficiency in Model-Free Reinforcement Learning from Images , 2019, AAAI.

[37]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[38]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Leonidas J. Guibas,et al.  Situational Fusion of Visual Representation for Visual Navigation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[41]  Renaud Dubé,et al.  Robot Navigation in Crowded Environments Using Deep Reinforcement Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[42]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[43]  Jonathan Dodge,et al.  Visualizing and Understanding Atari Agents , 2017, ICML.

[44]  D Fox,et al.  Multiple-Target Reinforcement Learning with a Single Policy , 2011 .

[45]  Arpit Agarwal,et al.  Reinforcement Learning of Active Vision for Manipulating Objects under Occlusions , 2018, CoRL.

[46]  Rui Zhao,et al.  Maximum Entropy-Regularized Multi-Goal Reinforcement Learning , 2019, ICML.

[47]  Trevor Darrell,et al.  Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.