Online learning of shaping rewards in reinforcement learning
暂无分享,去创建一个
[1] C. Anderson,et al. Multigrid Q-learning , 1994 .
[2] J. Tsitsiklis,et al. An optimal one-way multigrid algorithm for discrete-time stochastic control , 1991 .
[3] Paolo Traverso,et al. Automated Planning: Theory & Practice , 2004 .
[4] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[5] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[6] Kevin D. Seppi,et al. Prioritization Methods for Accelerating MDP Solvers , 2005, J. Mach. Learn. Res..
[7] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[8] M. Grzes,et al. Plan-based reward shaping for reinforcement learning , 2008, 2008 4th International IEEE Conference Intelligent Systems.
[9] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.
[10] Manuela M. Veloso,et al. Layered Learning , 2000, ECML.
[11] Vadim Bulitko,et al. Real-Time Heuristic Search with a Priority Queue , 2007, IJCAI.
[12] Peter Norvig,et al. Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.
[13] Peter Stone,et al. Behavior transfer for value-function-based reinforcement learning , 2005, AAMAS '05.
[14] KearnsMichael,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002 .
[15] Daniel Kudenko,et al. Multigrid Reinforcement Learning with Reward Shaping , 2008, ICANN.
[16] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[17] Christopher M. Bishop,et al. Neural networks for pattern recognition , 1995 .
[18] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[19] Andrew W. Moore,et al. Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs , 1999, IJCAI.
[20] Stefan Edelkamp,et al. Automated Planning: Theory and Practice , 2007, Künstliche Intell..
[21] Alexander L. Strehl,et al. PAC Reinforcement Learning Bounds for RTDP and Rand-RTDP Technical Report , 2006 .
[22] S.J.J. Smith,et al. Empirical Methods for Artificial Intelligence , 1995 .
[23] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[24] Bhaskara Marthi,et al. Automatic shaping and decomposition of reward functions , 2007, ICML '07.
[25] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[26] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[27] Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..
[28] Andrew G. Barto,et al. Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.
[29] Andrew G. Barto,et al. Shaping as a method for accelerating reinforcement learning , 1992, Proceedings of the 1992 IEEE International Symposium on Intelligent Control.
[30] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[31] Niko Bohm,et al. An Evolutionary Approach to Tetris , 2005 .
[32] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[33] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[34] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[35] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[36] Gerald DeJong,et al. Qualitative reinforcement learning , 2006, ICML.
[37] Michael L. Littman,et al. Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.
[38] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[39] David Elkind,et al. Learning: An Introduction , 1968 .
[40] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[41] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[42] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[43] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[44] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..