Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis
暂无分享,去创建一个
Georgios B. Giannakis | Jian Sun | Gerald Tesauro | Songtao Lu | Gang Wang | G. Tesauro | G. Giannakis | Jian Sun | Songtao Lu | Gang Wang
[1] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[2] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[3] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[4] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[6] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[7] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.
[8] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[9] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[10] Vladislav Tadic,et al. On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001, Machine Learning.
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[13] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[14] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[15] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[16] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[17] V. Climenhaga. Markov chains and mixing times , 2013 .
[18] Nathaniel Korda,et al. On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence , 2014, ICML.
[19] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[20] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[21] Tie-Yan Liu,et al. Finite sample analysis of the GTD Policy Evaluation Algorithms in Markov Setting , 2017, NIPS.
[22] Vivek S. Borkar,et al. Distributed Reinforcement Learning via Gossip , 2013, IEEE Transactions on Automatic Control.
[23] Shie Mannor,et al. Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning , 2017, COLT.
[24] Olexandr Isayev,et al. Deep reinforcement learning for de novo drug design , 2017, Science Advances.
[25] Tamer Basar,et al. Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.
[26] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[27] Zhuoran Yang,et al. Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization , 2018, NeurIPS.
[28] Shie Mannor,et al. Finite Sample Analyses for TD(0) With Function Approximation , 2017, AAAI.
[29] Csaba Szepesvári,et al. Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? , 2018, AISTATS.
[30] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[31] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.
[32] Georgios B. Giannakis,et al. A Multistep Lyapunov Approach for Finite-Time Analysis of Biased Stochastic Approximation , 2019, ArXiv.
[33] Hartmut Neven,et al. Universal quantum control through deep reinforcement learning , 2018, npj Quantum Information.
[34] Thinh T. Doan,et al. Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning , 2019, ICML.
[35] R. Srikant,et al. Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning , 2019, NeurIPS.
[36] Fredrik D. Johansson,et al. Guidelines for reinforcement learning in healthcare , 2019, Nature Medicine.
[37] Soummya Kar,et al. An introduction to decentralized stochastic optimization with gradient tracking , 2019 .
[38] Bin Hu,et al. Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory , 2019, NeurIPS.
[39] Yingbin Liang,et al. Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples , 2019, NeurIPS.
[40] Georgios B. Giannakis,et al. Almost Tune-Free Variance Reduction , 2019, ICML.
[41] Mingyi Hong,et al. Distributed Learning in the Nonconvex World: From batch data to streaming and beyond , 2020, IEEE Signal Processing Magazine.
[42] Georgios B. Giannakis,et al. On the Convergence of SARAH and Beyond , 2019, AISTATS.
[43] G. Giannakis,et al. Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation , 2019, AISTATS.
[44] Yingbin Liang,et al. Reanalysis of Variance Reduced Temporal Difference Learning , 2020, ICLR.
[45] Angelia Nedic,et al. Distributed stochastic gradient tracking methods , 2018, Mathematical Programming.
[46] Kun Yuan,et al. Multiagent Fully Decentralized Value Function Learning With Linear Convergence Rates , 2018, IEEE Transactions on Automatic Control.
[47] Martin J. Wainwright,et al. Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis , 2020, SIAM J. Math. Data Sci..