论文信息 - QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations

QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations

The paper develops <formula formulatype="inline"> <tex Notation="TeX">${{\cal Q} {\cal D}}$</tex></formula>-learning, a distributed version of reinforcement <formula formulatype="inline"><tex Notation="TeX">$Q$</tex> </formula>-learning, for multi-agent Markov decision processes (MDPs); the agents have no prior information on the global state transition and on the local agent cost statistics. The network agents minimize a network-averaged infinite horizon discounted cost, by local processing and by collaborating through mutual information exchange over a sparse (possibly stochastic) communication network. The agents respond differently (depending on their instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. When each agent is aware only of its local online cost data and the inter-agent communication network is weakly connected, we prove that <formula formulatype="inline"> <tex Notation="TeX">${{\cal Q} {\cal D}}$</tex></formula>-learning, a <formula formulatype="inline"> <tex Notation="TeX">$\rm consensus + innovations$</tex></formula> algorithm with mixed time-scale stochastic dynamics, converges asymptotically almost surely to the desired value function and to the optimal stationary control policy at each network agent.

H. Vincent Poor | Soummya Kar | José M. F. Moura | H. Poor | S. Kar

[1] Milind Tambe,et al. The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[2] M.G. Rabbat,et al. Generalized consensus computation in networked systems with erasure links , 2005, IEEE 6th Workshop on Signal Processing Advances in Wireless Communications, 2005..

[3] A. Shiryaev,et al. Limit Theorems for Stochastic Processes , 1987 .

[4] Richard S. Sutton,et al. Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.

[5] Ali H. Sayed,et al. Diffusion Least-Mean Squares Over Adaptive Networks: Formulation and Performance Analysis , 2008, IEEE Transactions on Signal Processing.

[6] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[7] Reza Olfati-Saber,et al. Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[8] Shin'ichi Yuta,et al. Coordinating Autonomous And Centralized Decision Making To Achieve Cooperative Behaviors Between Multiple Mobile Robots , 1992, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9] Yoav Shoham,et al. Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[10] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[11] José M. F. Moura,et al. Cooperative Convex Optimization in Networked Systems: Augmented Lagrangian Algorithms With Directed Gossip Communication , 2010, IEEE Transactions on Signal Processing.

[12] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[13] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[14] John Baillieul,et al. Robust and efficient quantization and coding for control of multidimensional linear systems under data rate constraints , 2006, CDC 2006.

[15] Michael L. Littman,et al. Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[16] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[17] Jie Lin,et al. Coordination of groups of mobile autonomous agents using nearest neighbor rules , 2003, IEEE Trans. Autom. Control..

[18] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.

[19] Stephen P. Boyd,et al. Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[20] Nikos A. Vlassis,et al. Non-communicative multi-robot coordination in dynamic environments , 2005, Robotics Auton. Syst..

[21] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[22] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[23] Soummya Kar,et al. Distributed Consensus Algorithms in Sensor Networks: Quantized Data and Random Link Failures , 2007, IEEE Transactions on Signal Processing.

[24] Peter Stone,et al. CMUnited: a team of robotics soccer agents collaborating in an adversarial environment , 1998, CROS.

[25] John N. Tsitsiklis,et al. On distributed averaging algorithms and quantization effects , 2007, 2008 47th IEEE Conference on Decision and Control.

[26] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[27] Richard M. Murray,et al. Consensus problems in networks of agents with switching topology and time-delays , 2004, IEEE Transactions on Automatic Control.

[28] José M. F. Moura,et al. Large deviations analysis of consensus+innovations detection in random networks , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[29] Peter Secretan. Learning , 1965, Mental Health.

[30] Hiroaki Kitano,et al. RoboCup-97: The First Robot World Cup Soccer Games and Conferences , 1998, AI Mag..

[31] Angelia Nedic,et al. Incremental Stochastic Subgradient Algorithms for Convex Optimization , 2008, SIAM J. Optim..

[32] Soummya Kar,et al. Distributed Consensus Algorithms in Sensor Networks With Imperfect Communication: Link Failures and Channel Noise , 2007, IEEE Transactions on Signal Processing.

[33] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.

[34] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[35] H. Vincent Poor,et al. Distributed Linear Parameter Estimation: Asymptotically Efficient Adaptive Strategies , 2011, SIAM J. Control. Optim..

[36] John N. Tsitsiklis,et al. Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[37] Sekhar Tatikonda,et al. Control under communication constraints , 2004, IEEE Transactions on Automatic Control.

[38] Soummya Kar,et al. Gossip Algorithms for Distributed Signal Processing , 2010, Proceedings of the IEEE.

[39] Ian A. Hiskens,et al. Achieving Controllability of Electric Loads , 2011, Proceedings of the IEEE.

[40] Michael William Newman,et al. The Laplacian spectrum of graphs , 2001 .

[41] Andrey V. Savkin,et al. The problem of state estimation via asynchronous communication channels with irregular transmission times , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[42] Soummya Kar,et al. Convergence Rate Analysis of Distributed Gossip (Linear Parameter) Estimation: Fundamental Limits and Tradeoffs , 2010, IEEE Journal of Selected Topics in Signal Processing.

[43] B. Mohar. THE LAPLACIAN SPECTRUM OF GRAPHS y , 1991 .

[44] C.C. White,et al. Dynamic programming and stochastic control , 1978, Proceedings of the IEEE.

[45] Nikos A. Vlassis,et al. Using the Max-Plus Algorithm for Multiagent Decision Making in Coordination Graphs , 2005, BNAIC.

[46] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[47] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[48] Manuela M. Veloso,et al. Decentralized MDPs with sparse interactions , 2011, Artif. Intell..

[49] Gonzalo Mateos,et al. Distributed Sparse Linear Regression , 2010, IEEE Transactions on Signal Processing.

[50] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[51] Fan Chung,et al. Spectral Graph Theory , 1996 .

[52] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[53] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[54] Soummya Kar,et al. Distributed Parameter Estimation in Sensor Networks: Nonlinear Observation Models and Imperfect Communication , 2008, IEEE Transactions on Information Theory.

[55] Daniel Kudenko,et al. Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.