论文信息 - Know Your Action Set: Learning Action Relations for Reinforcement Learning - 字舞流文

Know Your Action Set: Learning Action Relations for Reinforcement Learning

Intelligent agents can solve tasks in various ways depending on their available set of actions. However, conventional reinforcement learning (RL) assumes a fixed action set. This work asserts that tasks with varying action sets require reasoning of the relations between the available actions. For instance, taking a nail-action in a repair task is meaningful only if a hammer-action is also available. To learn and utilize such action relations, we propose a novel policy architecture consisting of a graph attention network over the available actions. We show that our model makes informed action decisions by correctly attending to other related actions in both value-based and policy-based RL. Consequently, it outperforms non-relational architectures on applications where the action space often varies, such as recommender systems and physical reasoning with tools and skills. 1

Joseph J. Lim | Ayush Jain | Norio Kosaka | KyungHyun Kim

[1] Shengyi Huang,et al. A Closer Look at Invalid Action Masking in Policy Gradient Algorithms , 2020, FLAIRS.

[2] Yongfeng Zhang,et al. Variation Control and Evaluation for Generative Slate Recommendations , 2021, WWW.

[3] C. Faloutsos,et al. P-Companion: A Principled Framework for Diversified Complementary Product Recommendation , 2020, CIKM.

[4] Joseph J. Lim,et al. Generalization to New Actions in Reinforcement Learning , 2020, ICML.

[5] Ville Hautamäki,et al. Action Space Shaping in Deep Reinforcement Learning , 2020, 2020 IEEE Conference on Games (CoG).

[6] Hao Wu,et al. Mastering Complex Control in MOBA Games with Deep Reinforcement Learning , 2019, AAAI.

[7] Philip S. Thomas,et al. Reinforcement Learning When All Actions are Not Always Available , 2019, AAAI.

[8] Philip S. Thomas,et al. Lifelong Learning with a Changing Action Set , 2019, AAAI.

[9] Zhao Li,et al. Co-Displayed Items Aware List Recommendation , 2020, IEEE Access.

[10] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[11] Craig Boutilier,et al. RecSim: A Configurable Simulation Platform for Recommender Systems , 2019, ArXiv.

[12] Jianwei Yin,et al. Learning Action-Transferable Policy with Action Embedding , 2019, ArXiv.

[13] Craig Boutilier,et al. SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets , 2019, IJCAI.

[14] Yu Gong,et al. Exact-K Recommendation via Maximal Clique Optimization , 2019, KDD.

[15] Alexei A. Efros,et al. Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity , 2019, NeurIPS.

[16] Shie Mannor,et al. The Natural Language of Actions , 2019, ICML.

[17] Philip S. Thomas,et al. Learning Action Representations for Reinforcement Learning , 2019, ICML.

[18] Yuan Qi,et al. Generative Adversarial User Model for Reinforcement Learning Based Recommendation System , 2018, ICML.

[19] Sergey Levine,et al. EMI: Exploration with Mutual Information , 2018, ICML.

[20] Razvan Pascanu,et al. Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[21] H. Francis Song,et al. Relational Forward Models for Multi-Agent Learning , 2018, ICLR.

[22] Timothy A. Mann,et al. Beyond Greedy Ranking: Slate Optimization via List-CVAE , 2018, ICLR.

[23] Jung-Woo Ha,et al. NSML: Meet the MLaaS platform with a real-world case study , 2018, ArXiv.

[24] Razvan Pascanu,et al. Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[25] Raia Hadsell,et al. Graph networks as learnable physics engines for inference and control , 2018, ICML.

[26] Jessica B. Hamrick,et al. Relational inductive bias for physical construction in humans and machines , 2018, CogSci.

[27] Liang Zhang,et al. Deep reinforcement learning for page-wise recommendations , 2018, RecSys.

[28] Craig Boutilier,et al. Planning and Learning with Stochastic Action Sets , 2018, IJCAI.

[29] Sanja Fidler,et al. NerveNet: Learning Structured Policy with Graph Neural Networks , 2018, ICLR.

[30] Liang Zhang,et al. Deep Reinforcement Learning for List-wise Recommendations , 2017, ArXiv.

[31] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[32] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[33] Heng-Tze Cheng,et al. Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[34] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[35] Jianfeng Gao,et al. Deep Reinforcement Learning with a Natural Language Action Space , 2015, ACL.

[36] Richard Evans,et al. Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[37] Peter Sunehag,et al. Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions , 2015, ArXiv.

[38] Wei Xu,et al. Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[39] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[40] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[42] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[43] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[44] Yi-Cheng Zhang,et al. Solving the apparent diversity-accuracy dilemma of recommender systems , 2008, Proceedings of the National Academy of Sciences.