论文信息 - Coding for Distributed Multi-Agent Reinforcement Learning

Coding for Distributed Multi-Agent Reinforcement Learning

This paper aims to mitigate straggler effects in synchronous distributed learning for multi-agent reinforcement learning (MARL) problems. Stragglers arise frequently in a distributed learning system, due to the existence of various system disturbances such as slow-downs or failures of compute nodes and communication bottlenecks. To resolve this issue, we propose a coded distributed learning framework, which speeds up the training of MARL algorithms in the presence of stragglers, while maintaining the same accuracy as the centralized approach. As an illustration, a coded distributed version of the multi-agent deep deterministic policy gradient (MADDPG) algorithm is developed and evaluated. Different coding schemes, including maximum distance separable (MDS) code, random sparse code, replication-based code, and regular low density parity check (LDPC) code are also investigated. Simulations in several multi-robot problems demonstrate the promising performance of the proposed framework.

[1] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.

[2] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[3] Tianshu Chu,et al. Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control , 2019, IEEE Transactions on Intelligent Transportation Systems.

[4] R. Stephenson. A and V , 1962, The British journal of ophthalmology.

[5] Amir Salman Avestimehr,et al. Coded computation over heterogeneous clusters , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[6] A. Klinger. THE VANDERMONDE MATRIX , 1967 .

[7] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[8] Qiang Wang,et al. ESetStore: An Erasure-Coded Storage System With Fast Data Recovery , 2020, IEEE Transactions on Parallel and Distributed Systems.

[9] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[10] Jérôme Lacan,et al. Systematic MDS erasure codes based on Vandermonde matrices , 2004, IEEE Communications Letters.

[11] Ming Zhou,et al. Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[12] Gerald Tesauro,et al. Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[13] Kannan Ramchandran,et al. Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[14] Zibin Zheng,et al. Integrating On-policy Reinforcement Learning with Multi-agent Techniques for Adaptive Service Composition , 2014, ICSOC.

[15] Agostino Forestiero,et al. Multi-Agent Recommendation System in Internet of Things , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[16] W. Marsden. I and J , 2012 .

[17] Francesco Mondada,et al. Multi-robot control and tracking framework for bio-hybrid systems with closed-loop interaction , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18] Mary Wootters,et al. Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning , 2019, IEEE Journal on Selected Areas in Information Theory.

[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20] Amir Salman Avestimehr,et al. Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy , 2018, AISTATS.

[21] Guillaume J. Laurent,et al. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[22] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[23] Kevin Barraclough,et al. I and i , 2001, BMJ : British Medical Journal.

[24] Pulkit Grover,et al. Coded convolution for parallel and distributed computing within a deadline , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[25] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[26] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[27] Mohammad Ali Maddah-Ali,et al. Coded MapReduce , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[28] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[29] Rüdiger L. Urbanke,et al. Modern Coding Theory , 2008 .

[30] Stephen Tyree,et al. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.

[31] P. Alam. ‘S’ , 2021, Composites Engineering: An A–Z Guide.

[32] Martin Bossert,et al. On the rank of LDPC matrices constructed by Vandermonde matrices and RS codes , 2006, 2006 IEEE International Symposium on Information Theory.

[33] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[34] Alexandros G. Dimakis,et al. Gradient Coding: Avoiding Stragglers in Distributed Learning , 2017, ICML.

[35] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[36] Dimitris S. Papailiopoulos,et al. Coded computation for multicore setups , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[37] W. Hager,et al. and s , 2019, Shallow Water Hydraulics.

[38] Nuno Lau,et al. Multi-agent actor centralized-critic with communication , 2020, Neurocomputing.

[39] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[40] Soummya Kar,et al. Coded Distributed Computing for Inverse Problems , 2017, NIPS.

[41] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[42] Kannan Ramchandran,et al. High-dimensional coded matrix multiplication , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[43] Amir Salman Avestimehr,et al. Near-Optimal Straggler Mitigation for Distributed Gradient Methods , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[44] Patrick P. C. Lee,et al. On the Speedup of Recovery in Large-Scale Erasure-Coded Storage Systems , 2014, IEEE Transactions on Parallel and Distributed Systems.

[45] Venkatesan Guruswami,et al. A locality-based approach for coded computation , 2020, ArXiv.