Distributed Dynamic Programming and an O.D.E. Framework of Distributed TD-Learning for Networked Multi-Agent Markov Decision Processes
暂无分享,去创建一个
[1] Shaoshuai Mou,et al. Finite-Sample Analysis of Multi-Agent Policy Evaluation with Kernelized Gradient Temporal Difference , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).
[2] Tamer Basar,et al. Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks , 2020, Autom..
[3] V. Cevher,et al. Optimization for Reinforcement Learning: From a single agent to cooperative agents , 2019, IEEE Signal Processing Magazine.
[4] T. Başar,et al. Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.
[5] Mihailo R. Jovanovic,et al. Fast Multi-Agent Temporal-Difference Learning via Homotopy Stochastic Primal-Dual Optimization , 2019, ArXiv.
[6] Thinh T. Doan,et al. Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning , 2019, ICML.
[7] Kun Yuan,et al. Multiagent Fully Decentralized Value Function Learning With Linear Convergence Rates , 2018, IEEE Transactions on Automatic Control.
[8] Zhuoran Yang,et al. Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization , 2018, NeurIPS.
[9] Milos S. Stankovic,et al. Multi-agent temporal-difference learning with linear function approximation: Weak convergence under time-varying network topologies , 2016, 2016 American Control Conference (ACC).
[10] Qing Ling,et al. EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.
[11] Ali H. Sayed,et al. Distributed Policy Evaluation Under Multiple Behavior Strategies , 2013, IEEE Transactions on Automatic Control.
[12] L. A. Prashanth,et al. Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods , 2012 .
[13] Jing Wang,et al. Control approach to distributed optimization , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[14] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[15] Angelia Nedic,et al. Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.
[16] Asuman E. Ozdaglar,et al. Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.
[17] L. Sucar,et al. Markov Decision Processes , 2004, Encyclopedia of Machine Learning and Data Mining.
[18] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[19] A. Jadbabaie,et al. Coordination of groups of mobile autonomous agents using nearest neighbor rules , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..
[20] Jianghai Hu,et al. Distributed Off-Policy Temporal Difference Learning Using Primal-Dual Method , 2022, IEEE Access.
[21] Dimitri P. Bertsekas,et al. Neuro-Dynamic Programming , 2009, Encyclopedia of Optimization.
[22] Hassan K. Khalil,et al. Nonlinear Systems Third Edition , 2008 .
[23] Leonard M. Adleman,et al. Proof of proposition 3 , 1992 .