暂无分享,去创建一个
Mihailo R. Jovanovic | Zhaoran Wang | Zhuoran Yang | Dongsheng Ding | Xiaohan Wei | Zhaoran Wang | M. Jovanović | Xiaohan Wei | Zhuoran Yang | Dongsheng Ding
[1] Thinh T. Doan,et al. Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning , 2019, ICML.
[2] Ali H. Sayed,et al. Distributed Policy Evaluation Under Multiple Behavior Strategies , 2013, IEEE Transactions on Automatic Control.
[3] Shie Mannor,et al. Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning , 2017, COLT.
[4] Tie-Yan Liu,et al. Finite sample analysis of the GTD Policy Evaluation Algorithms in Markov Setting , 2017, NIPS.
[5] Michael G. Rabbat,et al. Distributed strongly convex optimization , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[6] Tamer Basar,et al. Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks , 2020, Autom..
[7] Martin J. Wainwright,et al. Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.
[8] Thinh T. Doan,et al. Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function Approximation , 2019, SIAM J. Math. Data Sci..
[9] L. Györfi,et al. On the Averaged Stochastic Approximation for Linear Regression , 1996 .
[10] Mohammad S. Obaidat,et al. Residential Energy Management in Smart Grid: A Markov Decision Process-Based Approach , 2013, 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing.
[11] Milos S. Stankovic,et al. Multi-agent temporal-difference learning with linear function approximation: Weak convergence under time-varying network topologies , 2016, 2016 American Control Conference (ACC).
[12] Tamer Basar,et al. Decentralized multi-agent reinforcement learning with networked agents: recent advances , 2019, Frontiers of Information Technology & Electronic Engineering.
[13] Qing Ling,et al. Solving Non-smooth Constrained Programs with Lower Complexity than \mathcal{O}(1/\varepsilon): A Primal-Dual Homotopy Smoothing Approach , 2018, NeurIPS.
[14] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[15] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[16] Bin Hu,et al. Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory , 2019, NeurIPS.
[17] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[18] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[19] Stephen P. Boyd,et al. Gossip algorithms: design, analysis and applications , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..
[20] Tianbao Yang,et al. Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon) , 2016, NIPS.
[21] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[22] Pascal Vincent,et al. Convergent Tree-Backup and Retrace with Function Approximation , 2017, ICML.
[23] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.
[24] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[25] Vivek S. Borkar,et al. Distributed Reinforcement Learning via Gossip , 2013, IEEE Transactions on Automatic Control.
[26] Ioannis Ch. Paschalidis,et al. A Distributed Actor-Critic Algorithm and Applications to Mobile Sensor Network Coordination Problems , 2010, IEEE Transactions on Automatic Control.
[27] Zhuoran Yang,et al. Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization , 2018, NeurIPS.
[28] Jianghai Hu,et al. Primal-Dual Distributed Temporal Difference Learning. , 2018, 1805.07918.
[29] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.
[30] Naira Hovakimyan,et al. Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD , 2018, 2018 IEEE Conference on Decision and Control (CDC).
[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[32] Lin Xiao,et al. A Proximal-Gradient Homotopy Method for the Sparse Least-Squares Problem , 2012, SIAM J. Optim..
[33] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[34] Martin J. Wainwright,et al. Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.
[35] Tamer Basar,et al. Finite-Sample Analyses for Fully Decentralized Multi-Agent Reinforcement Learning , 2018, ArXiv.
[36] Yingbin Liang,et al. Finite-Sample Analysis for SARSA and Q-Learning with Linear Function Approximation , 2019, ArXiv.
[37] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[38] Shimon Whiteson,et al. Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs , 2008, ECML/PKDD.
[39] Kun Yuan,et al. Multiagent Fully Decentralized Value Function Learning With Linear Convergence Rates , 2018, IEEE Transactions on Automatic Control.
[40] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[41] Ali H. Sayed,et al. Multi-Agent Fully Decentralized Off-Policy Learning with Linear Convergence Rates , 2018, ArXiv.
[42] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[43] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[44] Georgios B. Giannakis,et al. Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation , 2020, AISTATS.
[45] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[46] I. Pinelis. OPTIMUM BOUNDS FOR THE DISTRIBUTIONS OF MARTINGALES IN BANACH SPACES , 1994, 1208.2200.
[47] Tianbao Yang,et al. RSG: Beating Subgradient Method without Smoothness and Strong Convexity , 2015, J. Mach. Learn. Res..
[48] Elizabeth L. Wilmer,et al. Markov Chains and Mixing Times , 2008 .
[49] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[50] H. Vincent Poor,et al. QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations , 2012, IEEE Trans. Signal Process..
[51] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[52] Thinh T. Doan,et al. Finite-Time Performance of Distributed Two-Time-Scale Stochastic Approximation , 2019, L4DC.
[53] Volkan Cevher,et al. Optimization for Reinforcement Learning: From a single agent to cooperative agents , 2020, IEEE Signal Processing Magazine.
[54] Kaiqing Zhang,et al. Finite-Sample Analysis For Decentralized Batch Multi-Agent Reinforcement Learning With Networked Agents. , 2018 .
[55] Shie Mannor,et al. Finite Sample Analyses for TD(0) With Function Approximation , 2017, AAAI.
[56] Tamer Basar,et al. Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.
[57] Csaba Szepesvári,et al. Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? , 2018, AISTATS.
[58] Michael I. Jordan,et al. Ergodic mirror descent , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[59] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[60] Angelia Nedic,et al. Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.
[61] Tamer Basar,et al. Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.
[62] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[63] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[64] Qing Ling,et al. Solving Non-smooth Constrained Programs with Lower Complexity than 𝒪(1/ε): A Primal-Dual Homotopy Smoothing Approach , 2018, NeurIPS.