Coding for Distributed Multi-Agent Reinforcement Learning

This paper aims to mitigate straggler effects in synchronous distributed learning for multi-agent reinforcement learning (MARL) problems. Stragglers arise frequently in a distributed learning system, due to the existence of various system disturbances such as slow-downs or failures of compute nodes and communication bottlenecks. To resolve this issue, we propose a coded distributed learning framework, which speeds up the training of MARL algorithms in the presence of stragglers, while maintaining the same accuracy as the centralized approach. As an illustration, a coded distributed version of the multi-agent deep deterministic policy gradient (MADDPG) algorithm is developed and evaluated. Different coding schemes, including maximum distance separable (MDS) code, random sparse code, replication-based code, and regular low density parity check (LDPC) code are also investigated. Simulations in several multi-robot problems demonstrate the promising performance of the proposed framework.

[1]  Zhuoran Yang,et al.  A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.

[2]  Alexander J. Smola,et al.  Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[3]  Tianshu Chu,et al.  Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control , 2019, IEEE Transactions on Intelligent Transportation Systems.

[4]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[5]  Amir Salman Avestimehr,et al.  Coded computation over heterogeneous clusters , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[6]  A. Klinger THE VANDERMONDE MATRIX , 1967 .

[7]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[8]  Qiang Wang,et al.  ESetStore: An Erasure-Coded Storage System With Fast Data Recovery , 2020, IEEE Transactions on Parallel and Distributed Systems.

[9]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[10]  Jérôme Lacan,et al.  Systematic MDS erasure codes based on Vandermonde matrices , 2004, IEEE Communications Letters.

[11]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[12]  Gerald Tesauro,et al.  Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[13]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[14]  Zibin Zheng,et al.  Integrating On-policy Reinforcement Learning with Multi-agent Techniques for Adaptive Service Composition , 2014, ICSOC.

[15]  Agostino Forestiero,et al.  Multi-Agent Recommendation System in Internet of Things , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[16]  W. Marsden I and J , 2012 .

[17]  Francesco Mondada,et al.  Multi-robot control and tracking framework for bio-hybrid systems with closed-loop interaction , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Mary Wootters,et al.  Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning , 2019, IEEE Journal on Selected Areas in Information Theory.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Amir Salman Avestimehr,et al.  Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy , 2018, AISTATS.

[21]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[22]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[23]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[24]  Pulkit Grover,et al.  Coded convolution for parallel and distributed computing within a deadline , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[25]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[26]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[27]  Mohammad Ali Maddah-Ali,et al.  Coded MapReduce , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[28]  Shane Legg,et al.  Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[29]  Rüdiger L. Urbanke,et al.  Modern Coding Theory , 2008 .

[30]  Stephen Tyree,et al.  Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.

[31]  P. Alam ‘S’ , 2021, Composites Engineering: An A–Z Guide.

[32]  Martin Bossert,et al.  On the rank of LDPC matrices constructed by Vandermonde matrices and RS codes , 2006, 2006 IEEE International Symposium on Information Theory.

[33]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[34]  Alexandros G. Dimakis,et al.  Gradient Coding: Avoiding Stragglers in Distributed Learning , 2017, ICML.

[35]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[36]  Dimitris S. Papailiopoulos,et al.  Coded computation for multicore setups , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[37]  W. Hager,et al.  and s , 2019, Shallow Water Hydraulics.

[38]  Nuno Lau,et al.  Multi-agent actor centralized-critic with communication , 2020, Neurocomputing.

[39]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[40]  Soummya Kar,et al.  Coded Distributed Computing for Inverse Problems , 2017, NIPS.

[41]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[42]  Kannan Ramchandran,et al.  High-dimensional coded matrix multiplication , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[43]  Amir Salman Avestimehr,et al.  Near-Optimal Straggler Mitigation for Distributed Gradient Methods , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[44]  Patrick P. C. Lee,et al.  On the Speedup of Recovery in Large-Scale Erasure-Coded Storage Systems , 2014, IEEE Transactions on Parallel and Distributed Systems.

[45]  Venkatesan Guruswami,et al.  A locality-based approach for coded computation , 2020, ArXiv.