Explanation-Based Learning and Reinforcement Learning: A Unified View

[1]  Allen Newell,et al.  The problem of expensive chunks and its solution by restricting expressiveness , 1993, Machine Learning.

[2]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[3]  Pat Langley,et al.  An architecture for persistent reactive behavior , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[4]  Andrew W. Moore,et al.  The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces , 1993, Machine Learning.

[5]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[6]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[7]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[8]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[9]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[10]  Tom M. Mitchell,et al.  Explanation-Based Generalization: A Unifying View , 1986, Machine Learning.

[11]  Allen Newell,et al.  Chunking in Soar: The anatomy of a general learning mechanism , 1985, Machine Learning.

[12]  Robert Givan,et al.  Relational Reinforcement Learning: An Overview , 2004, ICML 2004.

[13]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[14]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[15]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[16]  Yoram Singer,et al.  A simple, fast, and effective rule learner , 1999, AAAI 1999.

[17]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[18]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[19]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[20]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[21]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[22]  Raymond J. Mooney,et al.  Combining FOIL and EBG to Speed-up Logic Programs , 1993, IJCAI.

[23]  Alan D. Christiansen Learning to Predict in Uncertain Continuous Tasks , 1992, ML.

[24]  N. Flann Correct abstraction in counter-planning: a knowledge compilation approach , 1992 .

[25]  Stuart J. Russell,et al.  Do the right thing - studies in limited rationality , 1991 .

[26]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[27]  Charles L. Forgy,et al.  Rete: a fast algorithm for the many pattern/many object pattern match problem , 1991 .

[28]  Stuart J. Russell,et al.  Principles of Metareasoning , 1989, Artif. Intell..

[29]  Devika Subramanian,et al.  The Utility of EBL in Recursive Domain Theories , 1990, AAAI.

[30]  Paul E. Utgoff,et al.  Explaining Temporal Differences to Create Useful Concepts for Evaluating States , 1990, AAAI.

[31]  Claude Sammut,et al.  Is Learning Rate a Good Performance Criterion for Learning? , 1990, ML.

[32]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[33]  D. Bertsekas,et al.  Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[34]  C. Watkins Learning from delayed rewards , 1989 .

[35]  Christopher G. Atkeson,et al.  Using Local Models to Control Movement , 1989, NIPS.

[36]  Steven Minton,et al.  Quantitative Results Concerning the Utility of Explanation-based Learning , 1988, Artif. Intell..

[37]  Marshall W. Bern,et al.  Hidden surface removal for rectangles , 1988, SCG '88.

[38]  Jaime G. Carbonell,et al.  Learning effective search control knowledge: an explanation-based approach , 1988 .

[39]  Eric Horvitz,et al.  Reasoning about beliefs and actions under computational resource constraints , 1987, Int. J. Approx. Reason..

[40]  Ken Thompson,et al.  Retrograde Analysis of Certain Endgames , 1986, J. Int. Comput. Games Assoc..

[41]  Michael R. Genesereth,et al.  Logic programming , 1985, CACM.

[42]  Michael A. Erdmann,et al.  Using Backprojections for Fine Motion Planning with Uncertainty , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[43]  Russell H. Taylor,et al.  Automatic Synthesis of Fine-Motion Strategies for Robots , 1984 .

[44]  H. Edelsbrunner A new approach to rectangle intersections part I , 1983 .

[45]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[46]  Jon Doyle,et al.  A Truth Maintenance System , 1979, Artif. Intell..

[47]  R. Bellman Dynamic Programming , 1957, Science.

[48]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[49]  T. I. Cook Administrative Behavior , 1950, The Review of Politics.

[50]  Bill Broyles Notes , 1907, The Classical Review.

[51]  De,et al.  Relational Reinforcement Learning , 2001, Encyclopedia of Machine Learning and Data Mining.