Learning state and action space hierarchies for reinforcement learning using action-dependent partitioning
暂无分享,去创建一个
[1] Roderic A. Grupen,et al. Learning to Coordinate Controllers - Reinforcement Learning on a Control Basis , 1997, IJCAI.
[2] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[3] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[4] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[5] Robert Givan,et al. Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.
[6] David Harel,et al. Statecharts: A Visual Formalism for Complex Systems , 1987, Sci. Comput. Program..
[7] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.
[8] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[9] R. A. Brooks,et al. Intelligence without Representation , 1991, Artif. Intell..
[10] R. Korf. Learning to solve problems by searching for macro-operators , 1983 .
[11] Pattie Maes,et al. Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .
[12] Chris Drummond. Using a Case Base of Surfaces to Speed-Up Reinforcement Learning , 1997, ICCBR.
[13] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[14] Maja J. Matarić,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .
[15] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[16] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[17] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[18] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..
[19] Manfred Huber,et al. Autonomous Subgoal Discovery and Hierarchical Abstraction for Reinforcement Learning Using Monte Carlo Method , 2005, AAAI.
[20] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[21] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[22] Andrew G. Barto,et al. Heuristic Search in Infinite State Spaces Guided by Lyapunov Analysis , 2001, IJCAI.
[23] M. Huber,et al. Accelerating Action Dependent Hierarchical Reinforcement Learning Through Autonomous Subgoal Discovery , 2005 .
[24] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[25] Martin L. Puterman,et al. Discounted Markov Decision Problems , 2008 .
[26] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[27] Manfred Huber,et al. Subgoal Discovery for Hierarchical Reinforcement Learning Using Learned Policies , 2003 .
[28] IT Kee-EungKim. Solving Factored MDPs Using Non-homogeneous Partitions , 1998 .
[29] Doina Precup,et al. Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales , 1998 .
[30] Glenn A. Iba,et al. A heuristic approach to the discovery of macro-operators , 2004, Machine Learning.
[31] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[32] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[33] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[34] Richard Fikes,et al. Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..
[35] Thomas G. Dietterich. An Overview of MAXQ Hierarchical Reinforcement Learning , 2000, SARA.
[36] Abhijit Gosavi,et al. Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .
[37] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.