暂无分享,去创建一个
[1] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[2] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[3] H. Robbins. A Stochastic Approximation Method , 1951 .
[4] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[5] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[6] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[7] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[8] S. Smale. Convergent process of price adjust-ment and global newton methods , 1976 .
[9] Geoffrey J. Gordon. Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.
[10] Shalabh Bhatnagar,et al. Stability of Stochastic Approximations With “Controlled Markov” Noise and Temporal Difference Learning , 2015, IEEE Transactions on Automatic Control.
[11] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[12] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[13] S. Liberty,et al. Linear Systems , 2010, Scientific Parallel Computing.
[14] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[15] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[16] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[17] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[18] Adithya M. Devraj,et al. Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning , 2020, ArXiv.
[19] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[20] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[21] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[22] M. Metivier,et al. Applications of a Kushner and Clark lemma to general classes of stochastic algorithms , 1984, IEEE Trans. Inf. Theory.
[23] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[24] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[25] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[26] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[27] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[28] O. Nelles,et al. An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.
[29] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.
[30] Carlos S. Kubrusly,et al. Stochastic approximation algorithms and applications , 1973, CDC 1973.
[31] G. Fort,et al. Convergence of Markovian Stochastic Approximation with Discontinuous Dynamics , 2014, SIAM J. Control. Optim..
[32] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[33] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[34] Sean P. Meyn,et al. Zap Q-Learning - A User's Guide , 2019, 2019 Fifth Indian Control Conference (ICC).
[35] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.
[36] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[37] Sean P. Meyn,et al. A Liapounov bound for solutions of the Poisson equation , 1996 .
[38] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[39] Lawrence C. Evans,et al. Weak convergence methods for nonlinear partial differential equations , 1990 .
[40] V. Borkar,et al. A Concentration Bound for Stochastic Approximation via Alekseev’s Formula , 2015, Stochastic Systems.
[41] Wray L. Buntine,et al. Computing second derivatives in feed-forward networks: a review , 1994, IEEE Trans. Neural Networks.
[42] Yurii Nesterov,et al. Lectures on Convex Optimization , 2018 .
[43] Shalabh Bhatnagar,et al. A Generalization of the Borkar-Meyn Theorem for Stochastic Recursive Inclusions , 2015, Math. Oper. Res..
[44] Sean P. Meyn,et al. Zap Q-Learning , 2017, NIPS.
[45] László Gerencsér,et al. Convergence rate of moments in stochastic approximation with simultaneous perturbation gradient approximation and resetting , 1999, IEEE Trans. Autom. Control..
[46] Shalabh Bhatnagar,et al. Dynamics of stochastic approximation with Markov iterate-dependent noise with the stability of the iterates not ensured , 2016 .
[47] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[48] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[49] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[50] F. Clarke. Functional Analysis, Calculus of Variations and Optimal Control , 2013 .
[51] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[52] D. Ruppert. A Newton-Raphson Version of the Multivariate Robbins-Monro Procedure , 1985 .
[53] Michael I. Jordan,et al. Acceleration via Symplectic Discretization of High-Resolution Differential Equations , 2019, NeurIPS.
[54] Thinh T. Doan,et al. Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis , 2019 .
[55] Shalabh Bhatnagar,et al. Two Timescale Stochastic Approximation with Controlled Markov noise , 2015, Math. Oper. Res..
[56] Yoram Singer,et al. Second Order Optimization Made Practical , 2020, ArXiv.
[57] Santiago Zazo,et al. Diffusion gradient temporal difference for cooperative reinforcement learning with linear function approximation , 2012, 2012 3rd International Workshop on Cognitive Information Processing (CIP).
[58] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[59] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.
[60] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[61] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..
[62] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[63] J. Tsitsiklis,et al. Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.
[64] P. Olver. Nonlinear Systems , 2013 .
[65] Shie Mannor,et al. Concentration Bounds for Two Timescale Stochastic Approximation with Applications to Reinforcement Learning , 2017, ArXiv.
[66] Dimitri P. Bertsekas,et al. Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..
[67] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[68] Magnus Egerstedt,et al. Performance regulation and tracking via lookahead simulation: Preliminary results and validation , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).
[69] Ana Busic,et al. Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation , 2020, AISTATS.
[70] Magnus Egerstedt,et al. Tracking Control by the Newton-Raphson Flow: Applications to Autonomous Vehicles , 2019, 2019 18th European Control Conference (ECC).
[71] Shalabh Bhatnagar,et al. A stability criterion for two timescale stochastic approximation schemes , 2017, Autom..
[72] Eric Moulines,et al. Stability of Stochastic Approximation under Verifiable Conditions , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.
[73] Vivek S. Borkar,et al. Concentration bounds for two time scale stochastic approximation , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[74] Sean P. Meyn,et al. Fastest Convergence for Q-learning , 2017, ArXiv.