Know Your Action Set: Learning Action Relations for Reinforcement Learning

Intelligent agents can solve tasks in various ways depending on their available set of actions. However, conventional reinforcement learning (RL) assumes a fixed action set. This work asserts that tasks with varying action sets require reasoning of the relations between the available actions. For instance, taking a nail-action in a repair task is meaningful only if a hammer-action is also available. To learn and utilize such action relations, we propose a novel policy architecture consisting of a graph attention network over the available actions. We show that our model makes informed action decisions by correctly attending to other related actions in both value-based and policy-based RL. Consequently, it outperforms non-relational architectures on applications where the action space often varies, such as recommender systems and physical reasoning with tools and skills. 1

[1]  Shengyi Huang,et al.  A Closer Look at Invalid Action Masking in Policy Gradient Algorithms , 2020, FLAIRS.

[2]  Yongfeng Zhang,et al.  Variation Control and Evaluation for Generative Slate Recommendations , 2021, WWW.

[3]  C. Faloutsos,et al.  P-Companion: A Principled Framework for Diversified Complementary Product Recommendation , 2020, CIKM.

[4]  Joseph J. Lim,et al.  Generalization to New Actions in Reinforcement Learning , 2020, ICML.

[5]  Ville Hautamäki,et al.  Action Space Shaping in Deep Reinforcement Learning , 2020, 2020 IEEE Conference on Games (CoG).

[6]  Hao Wu,et al.  Mastering Complex Control in MOBA Games with Deep Reinforcement Learning , 2019, AAAI.

[7]  Philip S. Thomas,et al.  Reinforcement Learning When All Actions are Not Always Available , 2019, AAAI.

[8]  Philip S. Thomas,et al.  Lifelong Learning with a Changing Action Set , 2019, AAAI.

[9]  Zhao Li,et al.  Co-Displayed Items Aware List Recommendation , 2020, IEEE Access.

[10]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[11]  Craig Boutilier,et al.  RecSim: A Configurable Simulation Platform for Recommender Systems , 2019, ArXiv.

[12]  Jianwei Yin,et al.  Learning Action-Transferable Policy with Action Embedding , 2019, ArXiv.

[13]  Craig Boutilier,et al.  SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets , 2019, IJCAI.

[14]  Yu Gong,et al.  Exact-K Recommendation via Maximal Clique Optimization , 2019, KDD.

[15]  Alexei A. Efros,et al.  Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity , 2019, NeurIPS.

[16]  Shie Mannor,et al.  The Natural Language of Actions , 2019, ICML.

[17]  Philip S. Thomas,et al.  Learning Action Representations for Reinforcement Learning , 2019, ICML.

[18]  Yuan Qi,et al.  Generative Adversarial User Model for Reinforcement Learning Based Recommendation System , 2018, ICML.

[19]  Sergey Levine,et al.  EMI: Exploration with Mutual Information , 2018, ICML.

[20]  Razvan Pascanu,et al.  Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[21]  H. Francis Song,et al.  Relational Forward Models for Multi-Agent Learning , 2018, ICLR.

[22]  Timothy A. Mann,et al.  Beyond Greedy Ranking: Slate Optimization via List-CVAE , 2018, ICLR.

[23]  Jung-Woo Ha,et al.  NSML: Meet the MLaaS platform with a real-world case study , 2018, ArXiv.

[24]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[25]  Raia Hadsell,et al.  Graph networks as learnable physics engines for inference and control , 2018, ICML.

[26]  Jessica B. Hamrick,et al.  Relational inductive bias for physical construction in humans and machines , 2018, CogSci.

[27]  Liang Zhang,et al.  Deep reinforcement learning for page-wise recommendations , 2018, RecSys.

[28]  Craig Boutilier,et al.  Planning and Learning with Stochastic Action Sets , 2018, IJCAI.

[29]  Sanja Fidler,et al.  NerveNet: Learning Structured Policy with Graph Neural Networks , 2018, ICLR.

[30]  Liang Zhang,et al.  Deep Reinforcement Learning for List-wise Recommendations , 2017, ArXiv.

[31]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[32]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[33]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[34]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[35]  Jianfeng Gao,et al.  Deep Reinforcement Learning with a Natural Language Action Space , 2015, ACL.

[36]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[37]  Peter Sunehag,et al.  Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions , 2015, ArXiv.

[38]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[39]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[42]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[43]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[44]  Yi-Cheng Zhang,et al.  Solving the apparent diversity-accuracy dilemma of recommender systems , 2008, Proceedings of the National Academy of Sciences.