A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning
暂无分享,去创建一个
[1] Thinh T. Doan,et al. Finite-Sample Analysis of Two-Time-Scale Natural Actor–Critic Algorithm , 2021, IEEE Transactions on Automatic Control.
[2] Thinh T. Doan,et al. Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning , 2019, Autom..
[3] Tyler H. Summers,et al. Learning Optimal Controllers for Linear Systems With Multiplicative Noise via Policy Gradient , 2021, IEEE Transactions on Automatic Control.
[4] Jieping Ye,et al. On Finite-Time Convergence of Actor-Critic Algorithm , 2021, IEEE Journal on Selected Areas in Information Theory.
[5] Thinh T. Doan,et al. Finite-Time Analysis of Decentralized Stochastic Approximation with Applications in Multi-Agent and Multi-Task Learning , 2020, 2021 60th IEEE Conference on Decision and Control (CDC).
[6] Thinh T. Doan,et al. Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation , 2019, SIAM J. Control. Optim..
[7] Siva Theja Maguluri,et al. Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning , 2021, NeurIPS.
[8] A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic , 2020, ArXiv.
[9] Zhe Wang,et al. Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms , 2020, ArXiv.
[10] Quanquan Gu,et al. A Finite Time Analysis of Two Time-Scale Actor Critic Methods , 2020, NeurIPS.
[11] Mikhail Belkin,et al. Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning , 2020, ArXiv.
[12] Hoi-To Wai,et al. Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise , 2020, COLT.
[13] Thinh T. Doan,et al. Finite-Time Performance of Distributed Two-Time-Scale Stochastic Approximation , 2019, L4DC.
[14] Balázs Szörényi,et al. A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound , 2019, AAAI.
[15] On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation , 2019, ArXiv.
[16] Thinh T. Doan,et al. Linear Two-Time-Scale Stochastic Approximation A Finite-Time Analysis , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[17] R. Srikant,et al. Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning , 2019, NeurIPS.
[18] Yongxin Chen,et al. On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost , 2019, ArXiv.
[19] Shaofeng Zou,et al. Finite-Sample Analysis for SARSA with Linear Function Approximation , 2019, NeurIPS.
[20] Tamer Basar,et al. A Finite Sample Analysis of the Actor-Critic Algorithm , 2018, 2018 IEEE Conference on Decision and Control (CDC).
[21] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.
[22] Shie Mannor,et al. Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning , 2017, COLT.
[23] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[24] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[25] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[26] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[27] A. Mokkadem,et al. Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms , 2006, math/0610329.
[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[29] Vijay R. Konda,et al. Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.
[30] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[31] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.