Reinforcement-Driven Shaping of Sequence Learning in Neural Dynamics

We present here a simulated model of a mobile Kuka Youbot which makes use of Dynamic Field Theory for its underlying perceptual and motor control systems, while learning behavioral sequences through Reinforcement Learning. Although dynamic neural fields have previously been used for robust control in robotics, high-level behavior has generally been pre-programmed by hand. In the present work we extend a recent framework for integrating reinforcement learning and dynamic neural fields, by using the principle of shaping, in order to reduce the search space of the learning agent.

[1]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[2]  G. Schoner,et al.  Dynamic field theory of sequential action: A model and its implementation on an embodied agent , 2008, 2008 7th IEEE International Conference on Development and Learning.

[3]  John Loch,et al.  Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[4]  G. Peterson A day of great illumination: B. F. Skinner's discovery of shaping. , 2004, Journal of the experimental analysis of behavior.

[5]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .

[6]  Risto Miikkulainen,et al.  2-D Pole Balancing with Recurrent Evolutionary Networks , 1998 .

[7]  David S. Touretzky,et al.  Shaping robot behavior using principles from instrumental conditioning , 1997, Robotics Auton. Syst..

[8]  Stephex GROSSBERGl Behavioral Contrast in Short Term Memory : Serial Binary Memory Models or Parallel Continuous Memory Models ? , 2003 .

[9]  Giovanni Indiveri,et al.  Swedish Wheeled Omnidirectional Mobile Robots: Kinematics Analysis and Control , 2009, IEEE Transactions on Robotics.

[10]  M. Spong,et al.  Robot Modeling and Control , 2005 .

[11]  Yulia Sandamirskaya,et al.  A neural dynamic model of associative two-process theory: The differential outcomes effect and infant development , 2014, 4th International Conference on Development and Learning and on Epigenetic Robotics.

[12]  Gregor Schöner,et al.  A Dynamic Field Architecture for the Generation of Hierarchically Organized Sequences , 2012, ICANN.

[13]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[14]  Richard S. Sutton,et al.  Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[15]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[16]  Michael R. James,et al.  SarsaLandmark: an algorithm for learning in POMDPs with landmarks , 2009, AAMAS.

[17]  Minoru Asada,et al.  Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.

[18]  Günther Palm,et al.  Artificial Neural Networks and Machine Learning – ICANN 2012 , 2012, Lecture Notes in Computer Science.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[21]  Stephen Grossberg,et al.  Laminar cortical dynamics of conscious speech perception: neural model of phonemic restoration using subsequent context in noise. , 2011, The Journal of the Acoustical Society of America.

[22]  G. Schöner Neural Systems and Behavior: Dynamical Systems Approaches , 2001 .

[23]  Marco Colombetti,et al.  Robot Shaping: An Experiment in Behavior Engineering , 1997 .

[24]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[25]  Jürgen Schmidhuber,et al.  Autonomous reinforcement of behavioral sequences in neural dynamics , 2012, The 2013 International Joint Conference on Neural Networks (IJCNN).

[26]  Estela Bicho,et al.  Target Representation on an Autonomous Vehicle with Low-Level Sensors , 2000, Int. J. Robotics Res..

[27]  Vijaykumar Gullapalli,et al.  Reinforcement learning and its application to control , 1992 .

[28]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[29]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[30]  Stephen Grossberg,et al.  A neural model of sequential movement planning and control of eye movements: Item-Order-Rank working memory and saccade selection by the supplementary eye fields , 2012, Neural Networks.

[31]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[32]  David S. Touretzky,et al.  Operant Conditioning in Skinnerbots , 1997, Adapt. Behav..

[33]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[34]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[35]  Juyang Weng,et al.  Developmental Robotics: Theory and Experiments , 2004, Int. J. Humanoid Robotics.

[36]  Marco Colombetti,et al.  Training Agents to Perform Sequential Behavior , 1994, Adapt. Behav..

[37]  P. L. Adams THE ORIGINS OF INTELLIGENCE IN CHILDREN , 1976 .

[38]  Jürgen Leitner,et al.  Curiosity driven reinforcement learning for motion planning on humanoids , 2014, Front. Neurorobot..

[39]  Gregor Schöner,et al.  A robotic architecture for action selection and behavioral organization inspired by human cognition , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40]  Gregor Schöner,et al.  An embodied account of serial order: How instabilities drive sequence generation , 2010, Neural Networks.

[41]  Yulia Sandamirskaya,et al.  Neural dynamics of hierarchically organized sequences: A robotic implementation , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[42]  Faustino J. Gomez,et al.  T-Learning , 2011, ArXiv.

[43]  S. Amari Dynamics of pattern formation in lateral-inhibition type neural fields , 1977, Biological Cybernetics.

[44]  Gregor Schöner,et al.  A neural-dynamic architecture for behavioral organization of an embodied agent , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).