Abstraction and Generalization in Reinforcement Learning: A Summary and Framework

In this paper we survey the basics of reinforcement learning, generalization and abstraction. We start with an introduction to the fundamentals of reinforcement learning and motivate the necessity for generalization and abstraction. Next we summarize the most important techniques available to achieve both generalization and abstraction in reinforcement learning. We discuss basic function approximation techniques and delve into hierarchical, relational and transfer learning. All concepts and techniques are illustrated with examples.

[1]  E. Thorndike,et al.  The influence of improvement in one mental function upon the efficiency of other functions. (I). , 1901 .

[2]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[3]  James S. Albus,et al.  Brains, behavior, and robotics , 1981 .

[4]  David Kelley A theory of abstraction. , 1984 .

[5]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..

[6]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[7]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[8]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[9]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[10]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[12]  Claude-Nicolas Fiechter,et al.  Efficient reinforcement learning , 1994, COLT '94.

[13]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[14]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[15]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[16]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[17]  Ian Frank,et al.  Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[18]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[19]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[20]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[21]  Larry D. Pyeatt,et al.  Decision Tree Function Approximation in Reinforcement Learning , 1999 .

[22]  Marco Wiering,et al.  Explorations in efficient reinforcement learning , 1999 .

[23]  Thomas G. Dietterich An Overview of MAXQ Hierarchical Reinforcement Learning , 2000, SARA.

[24]  Gerhard Weiß A multiagent variant of Dyna-Q , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[25]  Balaraman Ravindran,et al.  Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.

[26]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[27]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[28]  Robert C Holte,et al.  Abstraction and reformulation in artificial intelligence. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[29]  Jean-Daniel Zucker,et al.  A grounded theory of abstraction in artificial intelligence. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[30]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[31]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[32]  Luc De Raedt,et al.  Relational Reinforcement Learning , 2001, Machine Learning.

[33]  A. Barto,et al.  An algebraic approach to abstraction in reinforcement learning , 2004 .

[34]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[35]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[36]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[37]  Jean-Daniel Zucker,et al.  Abstraction, Reformulation and Approximation, 6th International Symposium, SARA 2005, Airth Castle, Scotland, UK, July 26-29, 2005, Proceedings , 2005, SARA.

[38]  Peter Stone,et al.  Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[39]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[40]  Vishal Soni,et al.  Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains , 2006, AAAI.

[41]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[42]  Maurice Bruynooghe,et al.  Learning Relational Options for Inductive Transfer in Relational Reinforcement Learning , 2007, ILP.

[43]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[44]  Jude W. Shavlik,et al.  Relational Macros for Transfer in Reinforcement Learning , 2007, ILP.

[45]  Peter Stone,et al.  Model-Based Exploration in Continuous State Spaces , 2007, SARA.

[46]  Peter Stone,et al.  Cross-domain transfer for reinforcement learning , 2007, ICML '07.

[47]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[48]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[49]  Andrea Bonarini,et al.  Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[50]  Peter Stone,et al.  Transferring Instances for Model-Based Reinforcement Learning , 2008, ECML/PKDD.

[51]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[52]  Satinder P. Singh,et al.  Transfer via soft homomorphisms , 2009, AAMAS.

[53]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[54]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[55]  Kurt Driessens,et al.  Learning with Whom to Communicate Using Relational Reinforcement Learning , 2009, Interactive Collaborative Information Systems.

[56]  Matthew E. Taylor Assisting Transfer-Enabled Machine Learning Algorithms: Leveraging Human Knowledge for Curriculum Design , 2009, AAAI Spring Symposium: Agents that Learn from Human Teachers.

[57]  W. M. Wan,et al.  The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD , 2011 .

[58]  P. Schrimpf,et al.  Dynamic Programming , 2011 .