Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs
暂无分享,去创建一个
[1] Csaba Szepesvari,et al. Learning near-optimal policies with fitted policy iteration and a single sample path , 2005 .
[2] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[3] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[4] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[5] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[6] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[7] Gerald Tesauro,et al. TD-Gammon: A Self-Teaching Backgammon Program , 1995 .
[8] Andrea Bonarini,et al. Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods , 2007, NIPS.
[9] Shalabh Bhatnagar,et al. Incremental natural-gradient actor-critic algorithms , 2007 .
[10] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[11] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[12] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[13] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[14] Geoffrey J. Gordon. Stable Fitted Reinforcement Learning , 1995, NIPS.
[15] Shigenobu Kobayashi,et al. Reinforcement learning for continuous action using stochastic gradient ascent , 1998 .
[16] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[17] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[18] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[19] Richard S. Sutton,et al. Open Theoretical Questions in Reinforcement Learning , 1999, EuroCOLT.
[20] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[21] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[22] M.A. Wiering,et al. Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[23] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[24] Takeo Kanade,et al. Intelligent Autonomous Systems , 1991, Robotics Auton. Syst..
[25] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[26] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[27] S. Amari. Natural Gradient Works Eciently in Learning , 2022 .
[28] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
[29] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[30] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.