论文信息 - StARformer: Transformer with State-Action-Reward Representations - 字舞流文

StARformer: Transformer with State-Action-Reward Representations

Reinforcement Learning (RL) can be considered as a sequence modeling task, i.e., given a sequence of past state-action-reward experiences, a model autoregressively predicts a sequence of future actions. Recently, Transformers have been successfully adopted to model this problem. In this work, we propose State-Action-Reward Transformer (StARformer), which explicitly models local causal relations to help improve action prediction in long sequences. StARformer first extracts local representations (i.e., StAR-representations) from each group of state-action-reward tokens within a very short time span. A sequence of such local representations combined with state representations, is then used to make action predictions over a long time span. Our experiments show that StARformer outperforms the state-of-the-art Transformer-based method on Atari (image) and Gym (state vector) benchmarks, in both offline-RL and imitation learning settings. StARformer is also more compliant with longer sequences of inputs compared to the baseline. Our code is available at https://github.com/ elicassion/StARformer.

Michael S. Ryoo | Jinghuan Shang | M. Ryoo | Jinghuan Shang

[1] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.

[2] Mirella Lapata,et al. Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[3] Enhua Wu,et al. Transformer in Transformer , 2021, NeurIPS.

[4] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[5] Michael S. Ryoo,et al. TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? , 2021, ArXiv.

[6] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[7] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[8] Bowen Zhou,et al. A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[9] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[10] Cordelia Schmid,et al. ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[13] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Pieter Abbeel,et al. Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.

[16] Lu Yuan,et al. Focal Self-attention for Local-Global Interactions in Vision Transformers , 2021, ArXiv.

[17] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[18] Omri Bar,et al. Video Transformer Network , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[19] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.

[20] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[21] Sergey Levine,et al. Offline Reinforcement Learning as One Big Sequence Modeling Problem , 2021, NeurIPS.

[22] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[23] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.

[24] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[25] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[26] Gedas Bertasius,et al. Is Space-Time Attention All You Need for Video Understanding? , 2021, ICML.

[27] Jakob Uszkoreit,et al. A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[28] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.

[29] Sergey Levine,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[30] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[31] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.

[32] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .