Apprenticeship Learning for Initial Value Functions in Reinforcement Learning
暂无分享,去创建一个
[1] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[2] David H. Ackley,et al. Interactions between learning and evolution , 1991 .
[3] Richard E. Korf,et al. Finding Optimal Solutions to the Twenty-Four Puzzle , 1996, AAAI/IAAI, Vol. 2.
[4] Toru Ishida,et al. Controlling the learning process of real-time heuristic search , 2003, Artif. Intell..
[5] Blai Bonet,et al. A Robust and Fast Action Selection Mechanism for Planning , 1997, AAAI/IAAI.
[6] Russell Greiner,et al. Focus of Attention in Sequential Decision Making , 2004 .
[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[8] Vadim Bulitko,et al. Machine Learning for Adaptive Image Interpretation , 2004, AAAI.
[9] Claude Sammut,et al. Learning to Fly , 1992, ML.
[10] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[11] Jonathan Schaeffer,et al. Temporal Difference Learning Applied to a High-Performance Game-Playing Program , 2001, IJCAI.
[12] Sven Koenig,et al. A comparison of fast search methods for real-time situated agents , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..
[13] Gerald DeJong,et al. The Influence of Reward on the Speed of Reinforcement Learning: An Analysis of Shaping , 2003, ICML.
[14] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[15] Robert C. Holte,et al. Searching With Abstractions: A Unifying Framework and New High-Performance Algorithm 1 , 1994 .
[16] Richard E. Korf,et al. Finding Optimal Solutions to Rubik's Cube Using Pattern Databases , 1997, AAAI/IAAI.
[17] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..
[18] Richard E. Korf,et al. Disjoint pattern database heuristics , 2002, Artif. Intell..
[19] Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..
[20] Lin Zhang,et al. Decision-Theoretic Military Operations Planning , 2004, ICAPS.
[21] Vadim Bulitko,et al. Batch Reinforcement Learning with State Importance , 2004, ECML.
[22] Peter Stone,et al. Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.
[23] David C. Wilkins,et al. Qualitative simulation of temporal concurrent processes using Time Interval Petri Nets , 2003, Artif. Intell..