Autonomous Inter-Task Transfer in Reinforcement Learning Domains

Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. While these methods have had experimental successes and have been shown to exhibit some desirable properties in theory, the basic learning algorithms have often been found slow in practice. Therefore, much of the current RL research focuses on speeding up learning by taking advantage of domain knowledge, or by better utilizing agents' experience. The ambitious goal of transfer learning, when applied to RL tasks, is to accelerate learning on some target task after training on a different, but related, source task. This dissertation demonstrates that transfer learning methods can successfully improve learning in RL tasks via experience from previously learned tasks. Transfer learning can increase RL's applicability to difficult tasks by allowing agents to generalize their experience across learning problems. This dissertation presents inter-task mappings, the first transfer mechanism in this area to successfully enable transfer between tasks with different state variables and actions. Inter-task mappings have subsequently been used by a number of transfer researchers. A set of six transfer learning algorithms are then introduced. While these transfer methods differ in terms of what base RL algorithms they are compatible with, what type of knowledge they transfer, and what their strengths are, all utilize the same inter-task mapping mechanism. These transfer methods can all successfully use mappings constructed by a human from domain knowledge, but there may be situations in which domain knowledge is unavailable, or insufficient, to describe how two given tasks are related. We therefore also study how inter-task mappings can be learned autonomously by leveraging existing machine learning algorithms. Our methods use classification and regression techniques to successfully discover similarities between data gathered in pairs of tasks, culminating in what is currently one of the most robust mapping-learning algorithms for RL transfer. Combining transfer methods with these similarity-learning algorithms allows us to empirically demonstrate the plausibility of autonomous transfer. We fully implement these methods in four domains (each with different salient characteristics), show that transfer can significantly improve an agent's ability to learn in each domain, and explore the limits of transfer's applicability.

[1]  Csaba Szepesvári,et al.  An Evaluation Criterion for Macro-Learning and Some Results , 1999 .

[2]  Craig A. Kaplan Switch: A Simulation of Representational Change in the Mutilated Checkerboard Problem , 1989 .

[3]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[4]  S. Whiteson,et al.  Adaptive Tile Coding for Value Function Approximation , 2007 .

[5]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[6]  Milind Tambe,et al.  Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains , 2009 .

[7]  James S. Albus,et al.  Brains, behavior, and robotics , 1981 .

[8]  Peter Stone,et al.  Value-Function-Based Transfer for Reinforcement Learning Using Structure Mapping , 2006, AAAI.

[9]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[10]  E. Groves A Dissertation ON , 1928 .

[11]  Luc De Raedt,et al.  Relational Reinforcement Learning , 2001, Machine Learning.

[12]  R. Bellman Dynamic programming. , 1957, Science.

[13]  Peter Stagge,et al.  Averaging Efficiently in the Presence of Noise , 1998, PPSN.

[14]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[15]  Rich Caruana,et al.  Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[16]  Peter Stone,et al.  Model-Based Exploration in Continuous State Spaces , 2007, SARA.

[17]  Ashwin Ram,et al.  Transfer Learning in Real-Time Strategy Games Using Hybrid CBR/RL , 2007, IJCAI.

[18]  Doina Precup,et al.  Metrics for Markov Decision Processes with Infinite State Spaces , 2005, UAI.

[19]  Astro Teller,et al.  Evolving Team Darwin United , 1998, RoboCup.

[20]  Shimon Whiteson,et al.  Transfer Learning for Policy Search Methods , 2006 .

[21]  Minoru Asada,et al.  Vision-Based Behavior Acquisition For A Shooting Robot By Using A Reinforcement Learning , 1994 .

[22]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[23]  Atil Iscen,et al.  A new perspective to the keepaway soccer: the takers , 2008, AAMAS.

[24]  Andrew W. Moore,et al.  Variable Resolution Dynamic Programming , 1991, ML Workshop.

[25]  Rajarshi Das,et al.  Utility functions in autonomic systems , 2004 .

[26]  Raymond J. Mooney,et al.  Mapping and Revising Markov Logic Networks for Transfer Learning , 2007, AAAI.

[27]  Matthew E. Taylor Assisting Transfer-Enabled Machine Learning Algorithms: Leveraging Human Knowledge for Curriculum Design , 2009, AAAI Spring Symposium: Agents that Learn from Human Teachers.

[28]  Peter Stone,et al.  Graph-Based Domain Mapping for Transfer Learning in General Games , 2007, ECML.

[29]  Christopher G. Atkeson,et al.  A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.

[30]  Manuela M. Veloso,et al.  Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[31]  Robert Givan,et al.  Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[32]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[33]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[34]  Balaraman Ravindran,et al.  Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.

[35]  Jude W. Shavlik,et al.  Relational Macros for Transfer in Reinforcement Learning , 2007, ILP.

[36]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[37]  Shimon Whiteson,et al.  Transfer via inter-task mappings in policy search reinforcement learning , 2007, AAMAS '07.

[38]  Shimon Whiteson,et al.  Generalized Domains for Empirical Evaluations in Reinforcement Learning , 2009 .

[39]  Bikramjit Banerjee,et al.  General Game Learning Using Knowledge Transfer , 2007, IJCAI.

[40]  David Andre,et al.  Model based Bayesian Exploration , 1999, UAI.

[41]  Chris Drummond,et al.  Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks , 2011, J. Artif. Intell. Res..

[42]  J. McCarthy A Tough Nut for Proof Procedures , 1964 .

[43]  F. Sunmola Model Transfer for Markov Decision Tasks via Parameter Matching , 2006 .

[44]  Peter Stone,et al.  Accelerating Search with Transferred Heuristics , 2007 .

[45]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[46]  Peter Stone,et al.  Autonomous transfer for reinforcement learning , 2008, AAMAS.

[47]  Kurt Driessens,et al.  Transfer Learning in Reinforcement Learning Problems Through Partial Policy Recycling , 2007, ECML.

[48]  B. Skinner,et al.  Science and human behavior , 1953 .

[49]  Martin A. Riedmiller,et al.  Karlsruhe Brainstormers - A Reinforcement Learning Approach to Robotic Soccer , 2000, RoboCup.

[50]  Peter Stone,et al.  Value Functions for RL-Based Behavior Transfer: A Comparative Study , 2005, AAAI.

[51]  Robin Milner,et al.  A Calculus of Communicating Systems , 1980, Lecture Notes in Computer Science.

[52]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[53]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[54]  Peter Stone,et al.  Keepaway Soccer: A Machine Learning Testbed , 2001, RoboCup.

[55]  Shimon Whiteson,et al.  Adaptive Tile Coding for Reinforcement Learning , 2006 .

[56]  Manuela M. Veloso,et al.  Bounding the Suboptimality of Reusing Subproblem , 1999, IJCAI.

[57]  Peter Stone,et al.  Improving Action Selection in MDP's via Knowledge Transfer , 2005, AAAI.

[58]  A. Barto,et al.  An algebraic approach to abstraction in reinforcement learning , 2004 .

[59]  Pat Langley,et al.  Structural Transfer of Cognitive Skills , 2007 .

[60]  Sriraam Natarajan,et al.  Transfer in variable-reward hierarchical reinforcement learning , 2008, Machine Learning.

[61]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[62]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[63]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[64]  E. Thorndike,et al.  The influence of improvement in one mental function upon the efficiency of other functions. (I). , 1901 .

[65]  Risto Miikkulainen,et al.  Evolving a Roving Eye for Go , 2004, GECCO.

[66]  Shimon Whiteson,et al.  Comparing evolutionary and temporal difference methods in a reinforcement learning domain , 2006, GECCO.

[67]  Jude W. Shavlik,et al.  Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression , 2005, AAAI.

[68]  Peter Stone,et al.  Cross-domain transfer for reinforcement learning , 2007, ICML '07.

[69]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[70]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[71]  Tanaka Fumihide,et al.  Multitask Reinforcement Learning on the Distribution of MDPs , 2003 .

[72]  Xin Yao,et al.  Evolving artificial neural networks , 1999, Proc. IEEE.

[73]  Richard S. Sutton,et al.  Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[74]  Allen Ginsberg,et al.  Theory Revision via Prior Operationalization , 1988, AAAI.

[75]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[76]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[77]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[78]  Michael L. Littman,et al.  Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.

[79]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[80]  Peter Stone,et al.  IFSA: incremental feature-set augmentation for reinforcement learning tasks , 2007, AAMAS '07.

[81]  David W. Aha,et al.  Learning approximate preconditions for methods in hierarchical plans , 2005, ICML.

[82]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[83]  Matthew E. Taylor,et al.  Abstraction and Generalization in Reinforcement Learning: A Summary and Framework , 2009, ALA.

[84]  Terry Winograd,et al.  Understanding natural language , 1974 .

[85]  M. J. D. Powell,et al.  An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[86]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[87]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[88]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[89]  Manfred Huber,et al.  Effective Control Knowledge Transfer through Learning Skill and Representation Hierarchies , 2007, IJCAI.

[90]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[91]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[92]  Peter Dayan,et al.  Structure in the Space of Value Functions , 2002, Machine Learning.

[93]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[94]  Matthew E. Taylor,et al.  Speeding Up Reinforcement Learning with Behavior Transfer , 2004 .

[95]  Jude W. Shavlik,et al.  Skill Acquisition Via Transfer Learning and Advice Taking , 2006, ECML.

[96]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[97]  Doina Precup,et al.  Methods for Computing State Similarity in Markov Decision Processes , 2006, UAI.

[98]  Masayuki Yamamura,et al.  Multitask reinforcement learning on the distribution of MDPs , 2003, Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No.03EX694).

[99]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[100]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[101]  Maurice Bruynooghe,et al.  Learning Relational Options for Inductive Transfer in Relational Reinforcement Learning , 2007, ILP.

[102]  M. Veloso,et al.  Bounding the suboptimality of reusing subproblems , 1999, IJCAI 1999.

[103]  Sandor Markon,et al.  Threshold selection, hypothesis tests, and DOE methods , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[104]  Shimon Whiteson,et al.  Evolutionary Function Approximation for Reinforcement Learning , 2006, J. Mach. Learn. Res..

[105]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[106]  Eugene Fink,et al.  Automatic representation changes in problem solving , 1999 .

[107]  Ian Frank,et al.  Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[108]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[109]  Jude W. Shavlik,et al.  Creating advice-taking reinforcement learners , 1998 .

[110]  R. Bellman A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .

[111]  S. Mahadevan,et al.  Proto-transfer Learning in Markov Decision Processes Using Spectral Methods , 2006 .

[112]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[113]  Richard S. Sutton,et al.  On the role of tracking in stationary environments , 2007, ICML '07.

[114]  Larry D. Pyeatt,et al.  Decision Tree Function Approximation in Reinforcement Learning , 1999 .

[115]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[116]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[117]  C. Atkeson,et al.  Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .

[118]  J.L. Carroll,et al.  Task similarity measures for transfer in reinforcement learning task libraries , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[119]  Michael G. Madden,et al.  Transfer of Experience Between Reinforcement Learning Environments with Progressive Difficulty , 2004, Artificial Intelligence Review.

[120]  Risto Miikkulainen,et al.  Competitive Coevolution through Evolutionary Complexification , 2011, J. Artif. Intell. Res..

[121]  Thomas J. Walsh Transferring State Abstractions Between MDPs , 2006 .

[122]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[123]  Allen Newell,et al.  SOAR: An Architecture for General Intelligence , 1987, Artif. Intell..

[124]  Milind Tambe,et al.  Is There a Chink in Your ARMOR? Towards Robust Evaluations for Deployed Security Systems ? , 2009 .

[125]  Peter Stone,et al.  Representation Transfer for Reinforcement Learning , 2007, AAAI Fall Symposium: Computational Approaches to Representation Change during Learning and Development.

[126]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[127]  Doina Precup,et al.  Using Options for Knowledge Transfer in Reinforcement Learning , 1999 .

[128]  Vishal Soni,et al.  Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains , 2006, AAAI.

[129]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[130]  Erik Talvitie,et al.  An Experts Algorithm for Transfer Learning , 2007, IJCAI.

[131]  Balaraman Ravindran,et al.  Relativized Options: Choosing the Right Transformation , 2003, ICML.

[132]  Marco Colombetti,et al.  Robot shaping: developing situated agents through learning , 1992 .

[133]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[134]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[135]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[136]  Milind Tambe,et al.  Agent-based Evacuation Modeling: Simulating the Los Angeles International Airport , 2009 .

[137]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[138]  Satinder P. Singh,et al.  Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes , 1998, NIPS.

[139]  H. Simon,et al.  The functional equivalence of problem solving skills , 1975, Cognitive Psychology.

[140]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[141]  Peter Stone,et al.  Transferring Instances for Model-Based Reinforcement Learning , 2008, ECML/PKDD.

[142]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[143]  Raymond J. Mooney,et al.  Transfer Learning by Mapping with Minimal Target Data , 2008 .

[144]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[145]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[146]  Andrew G. Barto,et al.  An intrinsic reward mechanism for efficient exploration , 2006, ICML.

[147]  Milind Tambe,et al.  Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping , 2009, ICAPS.

[148]  William K. Durfee,et al.  IEEE/RSJ/GI International Conference on Intelligent Robots and Systems , 1994 .

[149]  Jude W. Shavlik,et al.  Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another , 2005, ECML.

[150]  Massimiliano Pontil,et al.  Best Of NIPS 2005: Highlights on the 'Inductive Transfer : 10 Years Later' Workshop , 2006 .

[151]  Kenneth A. De Jong,et al.  Cooperative Coevolution: An Architecture for Evolving Coadapted Subcomponents , 2000, Evolutionary Computation.

[152]  Kishan G. Mehrotra,et al.  Elements of artificial neural networks , 1996 .

[153]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[154]  Milind Tambe,et al.  Beyond Runtimes and Optimality: Challenges and Opportunities in Evaluating Deployed Security Systems , 2009 .

[155]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.