暂无分享,去创建一个
[1] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[2] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[3] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[4] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[5] Marcello Restelli,et al. Stochastic Variance-Reduced Policy Gradient , 2018, ICML.
[6] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[7] M. Littman,et al. Approaching Bayes-optimalilty using Monte-Carlo tree search , 2011 .
[8] Tamer Başar,et al. Convergence and Iteration Complexity of Policy Gradient Method for Infinite-horizon Reinforcement Learning , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).
[9] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[10] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[11] Luca Bascetta,et al. Policy gradient in Lipschitz Markov Decision Processes , 2015, Machine Learning.
[12] David Silver,et al. Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..
[13] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[14] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[15] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[16] Ron Meir,et al. A Convergent Online Single Time Scale Actor Critic Algorithm , 2009, J. Mach. Learn. Res..
[17] Shie Mannor,et al. Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning , 2017, COLT.
[18] Tamer Basar,et al. Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.
[19] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[20] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .
[21] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[22] Yingbin Liang,et al. Finite-Sample Analysis for SARSA and Q-Learning with Linear Function Approximation , 2019, ArXiv.
[23] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[24] Jooyoung Park,et al. Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.
[25] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[26] V. Borkar. Stochastic approximation with two time scales , 1997 .
[27] Santiago Paternain,et al. Stochastic Control Foundations of Autonomous Behavior , 2018 .
[28] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[29] R Bellman,et al. On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.
[30] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[32] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[33] Mengdi Wang,et al. Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.
[34] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[35] Warren B. Powell,et al. “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.
[36] John Wright,et al. A Geometric Analysis of Phase Retrieval , 2016, International Symposium on Information Theory.
[37] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[38] Pierpaolo Pontrandolfo,et al. Inventory management in supply chains: a reinforcement learning approach , 2002 .
[39] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[40] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[41] Junwei Lu,et al. Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization , 2016, ArXiv.
[42] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[43] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[44] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[45] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[46] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[47] Warren B. Powell,et al. A comparison of approximate dynamic programming techniques on benchmark energy storage problems: Does anything work? , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[48] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[49] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[50] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[51] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[52] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[53] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.