Reinforcement Learning in Factored Action Spaces using Tensor Decompositions

We present an extended abstract for the previously published work TESSERACT [Mahajan et al., 2021], which proposes a novel solution for Reinforcement Learning (RL) in large, factored action spaces using tensor decompositions. The goal of this abstract is twofold: (1) To garner greater interest amongst the tensor research community for creating methods and analysis for approximate RL, (2) To elucidate the generalised setting of factored action spaces where tensor decompositions can be used. We use cooperative multi-agent reinforcement learning scenario as the exemplary setting where the action space is naturally factored across agents and learning becomes intractable without resorting to approximation on the underlying hypothesis space for candidate solutions.

[1]  Ying Wen,et al.  Factorized Q-learning for large-scale multi-agent systems , 2018, DAI.

[2]  Shimon Whiteson,et al.  RODE: Learning Roles to Decompose Multi-Agent Tasks , 2020, ICLR.

[3]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[4]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[5]  Yung Yi,et al.  QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[6]  Kamyar Azizzadenesheli,et al.  Reinforcement Learning in Structured and Partially Observable Environments , 2019 .

[7]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[8]  Gergely Dévai,et al.  An Introduction to the Lambda Calculus , 2008, CEFP.

[9]  Tor Lattimore,et al.  The Sample-Complexity of General Reinforcement Learning , 2013, ICML.

[10]  Shobha Venkataraman,et al.  Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.

[11]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[12]  Shimon Whiteson,et al.  Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning , 2021, ICML.

[13]  Jianye Hao,et al.  Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning , 2020, ArXiv.

[14]  Anima Anandkumar,et al.  Spectral Learning on Matrices and Tensors , 2019, Found. Trends Mach. Learn..

[15]  Stefano Bromuri,et al.  A Tensor Factorization Approach to Generalization in Multi-agent Reinforcement Learning , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[16]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[17]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[18]  Nadav Cohen,et al.  On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.

[19]  Maja Pantic,et al.  T-Net: Parametrizing Fully Convolutional Nets With a Single High-Order Tensor , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Max Jaderberg,et al.  Open-Ended Learning Leads to Generally Capable Agents , 2021, ArXiv.

[21]  Stefanos Zafeiriou,et al.  Tensor Methods in Computer Vision and Deep Learning , 2021, Proceedings of the IEEE.

[22]  Nikos D. Sidiropoulos,et al.  Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[23]  Theja Tulabandhula,et al.  Symmetry Detection and Exploitation for Function Approximation in Deep RL , 2017, AAMAS.

[24]  Shimon Whiteson,et al.  UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning , 2021, ICML.

[25]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[26]  Gang Niu,et al.  Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation , 2015, ACML.

[27]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[28]  Maja Pantic,et al.  Efficient N-Dimensional Convolutions via Higher-Order Factorization , 2019, ArXiv.

[29]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[30]  Maja Pantic,et al.  Incremental multi-domain learning with network latent tensor factorization , 2019, AAAI.

[31]  Theja Tulabandhula,et al.  Symmetry Learning for Function Approximation in Reinforcement Learning , 2017, ArXiv.

[32]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[33]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[34]  Kamyar Azizzadenesheli,et al.  Reinforcement Learning in Rich-Observation MDPs using Spectral Methods , 2016, 1611.03907.

[35]  Masashi Sugiyama,et al.  Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 2 Applications and Future Perspectives , 2017, Found. Trends Mach. Learn..

[36]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[37]  Shimon Whiteson,et al.  MAVEN: Multi-Agent Variational Exploration , 2019, NeurIPS.

[38]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[39]  Ann Nowé,et al.  Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems , 2018, ICML.

[40]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[41]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[42]  John Langford,et al.  PAC Reinforcement Learning with Rich Observations , 2016, NIPS.