Transformer in Transformer as Backbone for Deep Reinforcement Learning

Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work focuses on the former. Previous methods build the network with several modules like CNN, LSTM and Attention. Recent methods combine the Transformer with these modules for better performance. However, it requires tedious optimization skills to train a network composed of mixed modules, making these methods inconvenient to be used in practice. In this paper, we propose to design \emph{pure Transformer-based networks} for deep RL, aiming at providing off-the-shelf backbones for both the online and offline settings. Specifically, the Transformer in Transformer (TIT) backbone is proposed, which cascades two Transformers in a very natural way: the inner one is used to process a single observation, while the outer one is responsible for processing the observation history; combining both is expected to extract spatial-temporal representations for good decision-making. Experiments show that TIT can achieve satisfactory performance in different settings consistently.

[1]  Arlindo L. Oliveira,et al.  Pretraining the Vision Transformer using self-supervised methods for vision based Deep Reinforcement Learning , 2022, ECAI.

[2]  Eloi Alonso,et al.  Transformers are Sample Efficient World Models , 2022, ICLR.

[3]  L. Melo Transformers are Meta-Reinforcement Learners , 2022, ICML.

[4]  Sheila A. McIlraith,et al.  You Can't Count on Luck: Why Decision Transformers Fail in Stochastic Environments , 2022, NeurIPS.

[5]  M. V. D. Panne,et al.  Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels , 2022, ArXiv.

[6]  Daniel S. Davila,et al.  Cascade Transformers for End-to-End Person Search , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jaesik Yoon,et al.  TransDreamer: Reinforcement Learning with Transformer World Models , 2022, ArXiv.

[8]  Amy Zhang,et al.  Online Decision Transformer , 2022, ICML.

[9]  Shuicheng Yan,et al.  DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition , 2021, ECCV.

[10]  Adria Puidomenech Badia,et al.  CoBERL: Contrastive BERT for Reinforcement Learning , 2021, ICLR.

[11]  Rutav Shah,et al.  RRL: Resnet as representation for Reinforcement Learning , 2021, ICML.

[12]  Sergey Levine,et al.  Offline Reinforcement Learning as One Big Sequence Modeling Problem , 2021, NeurIPS.

[13]  Pieter Abbeel,et al.  Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.

[14]  김정호,et al.  Transformer , 2021, Shipboard Electrical Power Systems.

[15]  Cordelia Schmid,et al.  ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Enhua Wu,et al.  Transformer in Transformer , 2021, NeurIPS.

[17]  Devesh K. Jha,et al.  Training Larger Networks for Deep Reinforcement Learning , 2021, ArXiv.

[18]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[19]  Animesh Garg,et al.  D2RL: Deep Dense Architectures in Reinforcement Learning , 2020, ArXiv.

[20]  T. Taniguchi,et al.  Dreaming: Model-based Reinforcement Learning by Latent Imagination without Reconstruction , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[21]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[22]  Tom B. Brown,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[23]  Justin Fu,et al.  D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[24]  Devesh K. Jha,et al.  Can Increasing Input Dimensionality Improve Deep Reinforcement Learning? , 2020, ICML.

[25]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[26]  Jimmy Ba,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[27]  Razvan Pascanu,et al.  Stabilizing Transformers for Reinforcement Learning , 2019, ICML.

[28]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[29]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[30]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[31]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[32]  Marc G. Bellemare,et al.  Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.

[33]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[34]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[35]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[36]  Stephen Tyree,et al.  GA3C: GPU-based A3C for Deep Reinforcement Learning , 2016, ArXiv.

[37]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[39]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Mikhail Pavlov,et al.  Deep Attention Recurrent Q-Network , 2015, ArXiv.

[42]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[43]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[44]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[45]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Marc G. Bellemare,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[47]  Michael I. Jordan,et al.  Trust Region Policy Optimization , 2015, ICML.

[48]  Hugh McIntosh,et al.  Contributors , 1966, The China Quarterly.

[49]  P. Engelstad,et al.  Deep Reinforcement Learning with Swin Transformer , 2022, ArXiv.

[50]  Ya Zhang,et al.  On Transforming Reinforcement Learning by Transformer: The Development Trajectory , 2022, ArXiv.

[51]  Tri Dao,et al.  Catformer: Designing Stable Transformers via Sensitivity Analysis , 2021, ICML.

[52]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[53]  A. Gleave,et al.  Stable-Baselines3: Reliable Reinforcement Learning Implementations , 2021, J. Mach. Learn. Res..

[54]  Ilya Makarov,et al.  Transformer-Based Deep Reinforcement Learning in VizDoom , 2021, AIST.

[55]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[56]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.