Hierarchical Approximate Policy Iteration With Binary-Tree State Space Decomposition
暂无分享,去创建一个
Simon X. Yang | Dewen Hu | Chunming Liu | Xin Xu | Simon X. Yang | Xin Xu | D. Hu | Chunming Liu
[1] D. Liu,et al. Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems With $\varepsilon$-Error Bound , 2011, IEEE Transactions on Neural Networks.
[2] Peter Stone,et al. Model-Based Exploration in Continuous State Spaces , 2007, SARA.
[3] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[4] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[5] Huaguang Zhang,et al. Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.
[6] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[7] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[9] Christos Dimitrakakis,et al. Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration , 2008, EWRL.
[10] Sridhar Mahadevan,et al. Hierarchical Average Reward Reinforcement Learning , 2007, J. Mach. Learn. Res..
[11] Shie Mannor,et al. The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.
[12] Huaguang Zhang,et al. Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.
[13] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[14] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.
[15] Manuela M. Veloso,et al. Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.
[16] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[17] MahadevanSridhar,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003 .
[18] Andrew G. Barto,et al. Automated State Abstraction for Options using the U-Tree Algorithm , 2000, NIPS.
[19] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[20] H. He,et al. Efficient Reinforcement Learning Using Recursive Least-Squares Methods , 2011, J. Artif. Intell. Res..
[21] Thomas G. Dietterich. State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.
[22] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[23] Bernhard Hengst,et al. Safe State Abstraction and Reusable Continuing Subtasks in Hierarchical Reinforcement Learning , 2007, Australian Conference on Artificial Intelligence.
[24] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[25] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[26] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[27] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[28] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[29] Steffen Udluft,et al. Ensembles of Neural Networks for Robust Reinforcement Learning , 2010, 2010 Ninth International Conference on Machine Learning and Applications.
[30] Donald C. Wunsch,et al. Backpropagation and Ordered Derivatives in the Time Scales Calculus , 2010, IEEE Transactions on Neural Networks.
[31] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[32] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[33] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[34] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.
[35] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[36] Marco Wiering,et al. Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[37] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[38] R. Bellman. Dynamic programming. , 1957, Science.
[39] Q. Hu,et al. Markov decision processes with their applications , 2007 .
[40] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[41] Lihong Li,et al. Online exploration in least-squares policy iteration , 2009, AAMAS.
[42] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[43] Haibo He,et al. Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming , 2011, IEEE Transactions on Neural Networks.
[44] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[45] Jennie Si,et al. Approximate Robust Policy Iteration Using Multilayer Perceptron Neural Networks for Discounted Infinite-Horizon Markov Decision Processes With Uncertain Correlated Transition Matrices , 2010, IEEE Transactions on Neural Networks.
[46] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[47] Xin Xu,et al. Residual-gradient-based neural reinforcement learning for the optimal control of an acrobot , 2002, Proceedings of the IEEE Internatinal Symposium on Intelligent Control.