Hierarchical Average Reward Reinforcement Learning
暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] R. Bellman. Dynamic programming. , 1957, Science.
[3] Satinder Singh. Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.
[4] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[5] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[6] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.
[7] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[8] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.
[9] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[10] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[11] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[12] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[13] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[14] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[15] Prasad Tadepalli,et al. Auto-Exploratory Average Reward Reinforcement Learning , 1996, AAAI/IAAI, Vol. 1.
[16] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[17] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[18] Prasad Tadepalli,et al. Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function , 1996, ICML.
[19] Prasad Tadepalli,et al. Model-Based Average Reward Reinforcement Learning , 1998, Artif. Intell..
[20] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[21] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .
[22] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[23] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[24] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[25] Gang Wang,et al. Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes , 1999, ICML.
[26] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[27] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[28] David Andre,et al. Programmable Reinforcement Learning Agents , 2000, NIPS.
[29] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[30] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[31] Sridhar Mahadevan,et al. Continuous-Time Hierarchical Reinforcement Learning , 2001, ICML.
[32] Model-based Hierarchical Average-reward Reinforcement Learning , 2002, ICML.
[33] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.
[34] Sridhar Mahadevan,et al. Hierarchically Optimal Average Reward Reinforcement Learning , 2002, ICML.
[35] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[36] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[37] Sridhar Mahadevan,et al. Hierarchical Policy Gradient Algorithms , 2003, ICML.
[38] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[39] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[40] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[41] Sridhar Mahadevan,et al. Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.
[42] Abhijit Gosavi,et al. Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .
[43] ModelsSridhar,et al. Designing Agent Controllers using Discrete-Event Markov , 2007 .
[44] J. E. Rooda,et al. Modeling and Analysis of Manufacturing Systems , 2007, Handbook of Dynamic System Modeling.