Natural-Gradient Actor-Critic Algorithms
暂无分享,去创建一个
[1] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[2] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[3] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[4] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[5] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[6] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[7] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[8] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[9] M. Narasimha Murty,et al. Information theoretic justification of Boltzmann selection and its generalization to Tsallis case , 2005, 2005 IEEE Congress on Evolutionary Computation.
[10] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[11] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[12] K. I. M. McKinnon,et al. On the Generation of Markov Decision Processes , 1995 .
[13] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[14] Peter Dayan,et al. Analytical Mean Squared Error Curves for Temporal Difference Learning , 1996, Machine Learning.
[15] Solomon Lefschetz,et al. Stability by Liapunov's Direct Method With Applications , 1962 .
[16] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[17] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[18] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[19] V. Borkar. Stochastic approximation with two time scales , 1997 .
[20] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[21] R. Bellman,et al. FUNCTIONAL APPROXIMATIONS AND DYNAMIC PROGRAMMING , 1959 .
[22] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[23] John Rust. Numerical dynamic programming in economics , 1996 .
[24] Morris W. Hirsch,et al. Convergent activation dynamics in continuous time networks , 1989, Neural Networks.
[25] Carlos S. Kubrusly,et al. Stochastic approximation algorithms and applications , 1973, CDC 1973.
[26] Michail G. Lagoudakis,et al. Model-Free Least-Squares Policy Iteration , 2001, NIPS.
[27] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[28] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[29] S. Andradóttir,et al. A Simulated Annealing Algorithm with Constant Temperature for Discrete Stochastic Optimization , 1999 .
[30] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[31] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[32] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[33] Vladislav Tadic,et al. On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001, Machine Learning.
[34] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[35] J. Tsitsiklis,et al. An optimal one-way multigrid algorithm for discrete-time stochastic control , 1991 .
[36] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[37] S. Thomas Alexander,et al. Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.
[38] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[39] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.
[40] Jonathan Baxter. KnightCap : A chess program that learns by combining TD ( ) with game-tree search , 1998 .
[41] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[42] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.
[43] D. J. White,et al. A Survey of Applications of Markov Decision Processes , 1993 .
[44] Shalabh Bhatnagar,et al. A simultaneous perturbation stochastic approximation-based actor-critic algorithm for Markov decision processes , 2004, IEEE Transactions on Automatic Control.
[45] V. Borkar. Recursive self-tuning control of finite Markov chains , 1997 .
[46] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[47] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[48] James W. Daniel,et al. Splines and efficiency in dynamic programming , 1976 .
[49] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[50] Abraham Thomas,et al. LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES , 2009 .
[51] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[52] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[53] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[54] Xi-Ren Cao,et al. Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..
[55] Shalabh Bhatnagar,et al. Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes , 2007, Discret. Event Dyn. Syst..