Reinforcement and shaping in learning action sequences with neural dynamics

Neural dynamics offer a theoretical and computational framework, in which cognitive architectures may be developed, which are suitable both to model psychophysics of human behaviour and to control robotic behaviour. Recently, we have introduced reinforcement learning in this framework, which allows an agent to learn goal-directed sequences of behaviours based on a reward signal, perceived at the end of a sequence. Although stability of the dynamic neural fields and behavioural organisation allowed to demonstrate autonomous learning in the robotic system, learning of longer sequences was taking prohibitedly long time. Here, we combine the neural dynamic reinforcement learning with shaping, which consists in providing intermediate rewards and accelerates learning.We have implemented the new learning algorithm on a simulated Kuka YouBot robot and evaluated robustness and efficacy of learning in a pick-and-place task.

[1]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[2]  Estela Bicho,et al.  The dynamic neural field approach to cognitive robotics , 2006, Journal of neural engineering.

[3]  Minoru Asada,et al.  Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  S. Amari Dynamics of pattern formation in lateral-inhibition type neural fields , 1977, Biological Cybernetics.

[6]  Gregor Schöner,et al.  A neural-dynamic architecture for behavioral organization of an embodied agent , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[7]  Jürgen Schmidhuber,et al.  Autonomous reinforcement of behavioral sequences in neural dynamics , 2012, The 2013 International Joint Conference on Neural Networks (IJCNN).

[8]  G. Peterson A day of great illumination: B. F. Skinner's discovery of shaping. , 2004, Journal of the experimental analysis of behavior.

[9]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[10]  Yulia Sandamirskaya,et al.  Dynamic neural fields as a step toward cognitive neuromorphic architectures , 2014, Front. Neurosci..

[11]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[12]  John Loch,et al.  Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[13]  Marco Colombetti,et al.  Training Agents to Perform Sequential Behavior , 1994, Adapt. Behav..

[14]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[15]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[16]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[17]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[18]  G. Schöner Neural Systems and Behavior: Dynamical Systems Approaches , 2001 .

[19]  David S. Touretzky,et al.  Operant Conditioning in Skinnerbots , 1997, Adapt. Behav..

[20]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[21]  Marco Colombetti,et al.  Robot Shaping: An Experiment in Behavior Engineering , 1997 .

[22]  Robert Lowe,et al.  Learning a DFT-based sequence with reinforcement learning: a NAO implementation , 2012, Paladyn J. Behav. Robotics.

[23]  D. Sofge THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .

[24]  Stephex GROSSBERGl Behavioral Contrast in Short Term Memory : Serial Binary Memory Models or Parallel Continuous Memory Models ? , 2003 .

[25]  Vijaykumar Gullapalli,et al.  Reinforcement learning and its application to control , 1992 .

[26]  Rodney A. Brooks,et al.  Elephants don't play chess , 1990, Robotics Auton. Syst..

[27]  Juyang Weng,et al.  Developmental Robotics: Theory and Experiments , 2004, Int. J. Humanoid Robotics.

[28]  Ralf Der,et al.  From Motor Babbling to Purposive Actions: Emerging Self-exploration in a Dynamical Systems Approach to Early Robot Development , 2006, SAB.

[29]  David S. Touretzky,et al.  Shaping robot behavior using principles from instrumental conditioning , 1997, Robotics Auton. Syst..

[30]  Richard S. Sutton,et al.  Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[31]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[32]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .

[33]  Estela Bicho,et al.  Target Representation on an Autonomous Vehicle with Low-Level Sensors , 2000, Int. J. Robotics Res..

[34]  Gregor Schöner,et al.  A robotic architecture for action selection and behavioral organization inspired by human cognition , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  Gregor Schöner,et al.  An embodied account of serial order: How instabilities drive sequence generation , 2010, Neural Networks.

[36]  Michael R. James,et al.  SarsaLandmark: an algorithm for learning in POMDPs with landmarks , 2009, AAMAS.