Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

Reinforcement Learning in large action spaces is a challenging problem. This is especially true for cooperative multi-agent reinforcement learning (MARL), which often requires tractable learning while respecting various constraints like communication budget and information about other agents. In this work, we focus on the fundamental hurdle affecting both value-based and policy-gradient approaches: an exponential blowup of the action space with the number of agents. For value-based methods, it poses challenges in accurately representing the optimal value function for value-based methods, thus inducing suboptimality. For policy gradient methods, it renders the critic ineffective and exacerbates the problem of the lagging critic. We show that from a learning theory perspective, both problems can be addressed by accurately representing the associated action-value function with a low-complexity hypothesis class. This requires accurately modelling the agent interactions in a sample efficient way. To this end, we propose a novel tensorised formulation of the Bellman equation. This gives rise to our method Tesseract, which utilises the view of Q-function seen as a tensor where the modes correspond to action spaces of different agents. Algorithms derived from Tesseract decompose the Q-tensor across the agents and utilise low-rank tensor approximations to model the agent interactions relevant to the task. We provide PAC analysis for Tesseract based algorithms and highlight their relevance to the class of rich observation MDPs. Empirical results in different domains confirm the gains in sample efficiency using Tesseract as supported by the theory.

[1]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[2]  Gergely Dévai,et al.  An Introduction to the Lambda Calculus , 2008, CEFP.

[3]  Shobha Venkataraman,et al.  Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.

[4]  Shimon Whiteson,et al.  RODE: Learning Roles to Decompose Multi-Agent Tasks , 2020, ICLR.

[5]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[6]  Yung Yi,et al.  QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[7]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[8]  Maja Pantic,et al.  Incremental multi-domain learning with network latent tensor factorization , 2019, AAAI.

[9]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[10]  Theja Tulabandhula,et al.  Symmetry Detection and Exploitation for Function Approximation in Deep RL , 2017, AAMAS.

[11]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[12]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.

[13]  Shimon Whiteson,et al.  Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2020, J. Mach. Learn. Res..

[14]  Jianye Hao,et al.  Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning , 2020, ArXiv.

[15]  Anima Anandkumar,et al.  Spectral Learning on Matrices and Tensors , 2019, Found. Trends Mach. Learn..

[16]  Maja Pantic,et al.  T-Net: Parametrizing Fully Convolutional Nets With a Single High-Order Tensor , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  John Langford,et al.  PAC Reinforcement Learning with Rich Observations , 2016, NIPS.

[18]  Shimon Whiteson,et al.  UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning , 2021, ICML.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Theja Tulabandhula,et al.  Symmetry Learning for Function Approximation in Reinforcement Learning , 2017, ArXiv.

[21]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[22]  Shimon Whiteson,et al.  MAVEN: Multi-Agent Variational Exploration , 2019, NeurIPS.

[23]  Lakhmi C. Jain,et al.  Innovations in Multi-Agent Systems and Applications - 1 , 2010 .

[24]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[25]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[26]  Nikos D. Sidiropoulos,et al.  Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[27]  Chao Wen,et al.  SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multiagent Reinforcement Learning , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Stefano Bromuri,et al.  A Tensor Factorization Approach to Generalization in Multi-agent Reinforcement Learning , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[30]  Kamyar Azizzadenesheli,et al.  Reinforcement Learning in Rich-Observation MDPs using Spectral Methods , 2016, 1611.03907.

[31]  Masashi Sugiyama,et al.  Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 2 Applications and Future Perspectives , 2017, Found. Trends Mach. Learn..

[32]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[33]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[34]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[35]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[36]  Prateek Jain,et al.  Provable Tensor Factorization with Missing Data , 2014, NIPS.

[37]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[38]  Ann Nowé,et al.  Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems , 2018, ICML.

[39]  Pedro U. Lima,et al.  Efficient Offline Communication Policies for Factored Multiagent POMDPs , 2011, NIPS.

[40]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[41]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[42]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[43]  Gang Niu,et al.  Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation , 2015, ACML.

[44]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[45]  Maja Pantic,et al.  Efficient N-Dimensional Convolutions via Higher-Order Factorization , 2019, ArXiv.

[46]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[47]  Tor Lattimore,et al.  The Sample-Complexity of General Reinforcement Learning , 2013, ICML.

[48]  Yang Yu,et al.  QPLEX: Duplex Dueling Multi-Agent Q-Learning , 2020, ArXiv.

[49]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[50]  Anima Anandkumar,et al.  Tensor Dropout for Robust Learning. , 2020 .

[51]  Stefanos Zafeiriou,et al.  Tensor Methods in Computer Vision and Deep Learning , 2021, Proceedings of the IEEE.

[52]  Nadav Cohen,et al.  On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.

[53]  Guillaume J. Laurent,et al.  Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[54]  Drew Wicke,et al.  Multiagent Soft Q-Learning , 2018, AAAI Spring Symposia.

[55]  Kamyar Azizzadenesheli,et al.  Reinforcement Learning in Structured and Partially Observable Environments , 2019 .