Knowledge Extraction from Learning Traces in Continuous Domains

A method is introduced to extract and transfer knowledge between a source and a target task in continuous domains and for direct policy search algorithms. The principle is (1) to use a direct policy search on the source task, (2) extract knowledge from the learning traces and (3) transfer this knowledge with a reward shaping approach. The knowledge extraction process consists in analyzing the learning traces, i.e. the behaviors explored while learning on the source task, to identify the behavioral features specific to successful solutions. Each behavioral feature is then attributed a value corresponding to the average reward obtained by the individuals exhibiting it. These values are used to shape rewards while learning on a target task. The approach is tested on a simulated ball collecting task in a continuous arena. The behavior of an individual is analyzed with the help of the generated knowledge bases.

[1]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[2]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[3]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[4]  Stéphane Doncieux,et al.  Encouraging Behavioral Diversity in Evolutionary Robotics: An Empirical Study , 2012, Evolutionary Computation.

[5]  Josh C. Bongard Morphological and environmental scaffolding synergize when evolving robot controllers: artificial life/robotics/evolvable hardware , 2011, GECCO '11.

[6]  Stéphane Doncieux,et al.  Behavioral diversity with multiple behavioral distances , 2013, 2013 IEEE Congress on Evolutionary Computation.

[7]  Stéphane Doncieux,et al.  Beyond black-box optimization: a review of selective pressures for evolutionary robotics , 2014, Evol. Intell..

[8]  Stéphane Doncieux,et al.  Why and how to measure exploration in behavioral space , 2011, GECCO '11.

[9]  Benjamin Kuipers,et al.  Autonomous Learning of High-Level States and Actions in Continuous Environments , 2012, IEEE Transactions on Autonomous Mental Development.

[10]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[11]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[12]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[13]  Michael G. Madden,et al.  Transfer of Experience Between Reinforcement Learning Environments with Progressive Difficulty , 2004, Artificial Intelligence Review.

[14]  David Windridge,et al.  Perception-action learning as an epistemologically-consistent model for self-updating cognitive representation. , 2010, Advances in experimental medicine and biology.

[15]  Marco Colombetti,et al.  Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[16]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[17]  Stephane Doncieux Transfer learning for direct policy search: A reward shaping approach , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[18]  Jérôme Kodjabachian,et al.  Evolution and Development of Neural Networks Controlling Locomotion, Gradient-Following, and Obstacle-Avoidance in Artificial Insects , 1998 .

[19]  Dario Floreano,et al.  Neuroevolution: from architectures to learning , 2008, Evol. Intell..

[20]  A. Clark,et al.  Trading spaces: Computation, representation, and the limits of uninformed learning , 1997, Behavioral and Brain Sciences.

[21]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[22]  Michèle Sebag,et al.  Open-Ended Evolutionary Robotics: An Information Theoretic Approach , 2010, PPSN.

[23]  Francisco Fernández de Vega,et al.  Speciation in Behavioral Space for Evolutionary Robotics , 2011, J. Intell. Robotic Syst..

[24]  Stéphane Doncieux,et al.  Incremental Evolution of Animats' Behaviors as a Multi-objective Optimization , 2008, SAB.

[25]  Nicholas J. Radcliffe,et al.  Genetic set recombination and its application to neural network topology optimisation , 1993, Neural Computing & Applications.