Knowledge Transfer Using Local Features

We present a method for reducing the effort required to compute policies for tasks based on solutions to previously solved tasks. The key idea is to use a learned intermediate policy based on local features to create an initial policy for the new task. In order to further improve this initial policy, we developed a form of generalized policy iteration. We achieve a substantial reduction in computation needed to find policies when previous experience is available

[1]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[2]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[3]  Allen Newell,et al.  Chunking in Soar: The anatomy of a general learning mechanism , 1985, Machine Learning.

[4]  Manuela M. Veloso,et al.  An evolutionary approach to gait learning for four-legged robots , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[5]  Peter Stone,et al.  Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[6]  Daniel E. Koditschek,et al.  Automated gait adaptation for legged robots , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[7]  Sridhar Mahadevan,et al.  Enhancing Transfer in Reinforcement Learning by Building Stochastic Models of Robot Actions , 1992, ML.

[8]  Manuela M. Veloso,et al.  Learning and using models of kicking motions for legged robots , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[9]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[10]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[11]  Robert Givan,et al.  Learning Domain-Specific Control Knowledge from Random Walks , 2004, ICAPS.

[12]  Thomas Röfer,et al.  Evolutionary Gait-Optimization Using a Fitness Function Based on Proprioception , 2004, RoboCup.

[13]  Christopher G. Atkeson,et al.  Learning from observation using primitives , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[14]  Manuela Veloso Learning by analogical reasoning in general problem-solving , 1992 .

[15]  Balaraman Ravindran,et al.  SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003, IJCAI.

[16]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[19]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[20]  Gordon Cheng,et al.  Learning Similar Tasks From Observation and Practice , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Manuela Veloso,et al.  Automatically Acquiring Planning Templates from Example Plans , 2002 .

[22]  Andrew G. Barto,et al.  Autonomous discovery of temporal abstractions from interaction with an environment , 2002 .

[23]  Glenn A. Iba,et al.  A Heuristic Approach to the Discovery of Macro-Operators , 1989, Machine Learning.

[24]  Scott Davies,et al.  Multidimensional Triangulation and Interpolation for Reinforcement Learning , 1996, NIPS.