Chaotic dynamics and convergence analysis of temporal difference algorithms with bang‐bang control
暂无分享,去创建一个
[1] John N. Tsitsiklis,et al. On the Convergence of Optimistic Policy Iteration , 2002, J. Mach. Learn. Res..
[2] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[3] Lihong Li,et al. A worst-case comparison between temporal difference and residual gradient with linear function approximation , 2008, ICML '08.
[4] Dimitri P. Bertsekas,et al. Temporal Difference Methods for General Projected Equations , 2011, IEEE Transactions on Automatic Control.
[5] F.L. Lewis,et al. Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.
[6] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[7] Bart De Schutter,et al. Approximate reinforcement learning: An overview , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[8] Paul Glendinning. Stability, Instability and Chaos: INTRODUCTION , 1994 .
[9] Sabino Gadaleta,et al. Reinforcement learning chaos control using value sensitive vector-quantization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).
[10] Sanjay S. Joshi,et al. Reinforcement Learning Solution to a Benchmark Time- Optimal Control Problem , 2007 .
[11] Manfred K. Warmuth,et al. On the Worst-Case Analysis of Temporal-Difference Learning Algorithms , 2005, Machine Learning.
[12] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[13] Artur Merke,et al. TD(0) Converges Provably Faster than the Residual Gradient Algorithm , 2003, ICML.
[14] Piotr Kowalczyk,et al. Micro-chaotic dynamics due to digital sampling in hybrid systems of Filippov type , 2010 .
[15] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[16] Peter Stone,et al. Function Approximation via Tile Coding: Automating Parameter Choice , 2005, SARA.
[17] Matthieu Geist,et al. Parametric value function approximation: A unified view , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[18] P. Dayan,et al. Reward, Motivation, and Reinforcement Learning , 2002, Neuron.
[19] Chuen-Tsai Sun,et al. Functional equivalence between radial basis function networks and fuzzy inference systems , 1993, IEEE Trans. Neural Networks.
[20] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.