Biologically inspired reinforcement learning for mobile robot collision avoidance

Collision avoidance is a key technology enabling applications such as autonomous vehicles and robots. Various reinforcement learning techniques such as the popular Q-learning algorithms have emerged as a promising solution for collision avoidance in robotics. While spiking neural networks (SNNs), the third generation model of neural networks, have gained increased interest due to their closer resemblance to biological neural circuits in the brain, the application of SNNs to mobile robot navigation has not been well studied. Under the context of reinforcement learning, this paper aims to investigate the potential of biologically-motivated spiking neural networks for goal-directed collision avoidance in reasonably complex environments. Unlike the existing additive reward-modulated spike-timing dependent plasticity learning rule (A-RM-STDP), for the first time, we explore a new multiplicative RM-STDP scheme (M-RM-STDP) for the targeted application. Furthermore, we propose a more biologically plausible feed-forward spiking neural network architecture with fine-grained global rewards. Finally, by combining the above two techniques we demonstrate a further improved solution to collision avoidance. Our proposed approaches not only completely outperform Q-learning for cases where Q-learning can hardly reach the target without collision, but also significantly outperform a baseline SNN with A-RM-STDP in terms of both success rate and the quality of navigation trajectories.

[1]  A. Hodgkin,et al.  A quantitative description of membrane current and its application to conduction and excitation in nerve , 1990 .

[2]  Min Guo,et al.  Reinforcement Learning Neural Network to the Problem of Autonomous Mobile Robot Obstacle Avoidance , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[3]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[4]  N. Peric,et al.  A reinforcement learning approach to obstacle avoidance of mobile robots , 2002, 7th International Workshop on Advanced Motion Control. Proceedings (Cat. No.02TH8623).

[5]  Sarangapani Jagannathan,et al.  Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence , 2009, Neural Networks.

[6]  Richard Evans,et al.  Reinforcement Learning in a Neurally Controlled Robot Using Dopamine Modulated STDP , 2015, ArXiv.

[7]  Yotam Luz,et al.  Balancing Feed-Forward Excitation and Inhibition via Hebbian Inhibitory Synaptic Plasticity , 2012, PLoS Comput. Biol..

[8]  Razvan V. Florian,et al.  Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neural Computation.

[9]  E. Izhikevich Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.

[10]  Xu Wang,et al.  Spiking neural network-based target tracking control for autonomous mobile robots , 2015, Neural Computing and Applications.

[11]  Robert A. Legenstein,et al.  A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback , 2008, PLoS Comput. Biol..

[12]  Luigi Fortuna,et al.  Learning Anticipation via Spiking Networks: Application to Navigation Control , 2009, IEEE Transactions on Neural Networks.

[13]  Romain Brette,et al.  The Brian Simulator , 2009, Front. Neurosci..

[14]  Eugene M. Izhikevich,et al.  Simple model of spiking neurons , 2003, IEEE Trans. Neural Networks.

[15]  Mark C. W. van Rossum,et al.  Stable Hebbian Learning from Spike Timing-Dependent Plasticity , 2000, The Journal of Neuroscience.

[16]  Ouahiba Azouaoui,et al.  Reinforcement learning based group navigation approach for multiple autonomous robotic system , 2005, IEEE International Conference Mechatronics and Automation, 2005.

[17]  Xiaogang Ruan,et al.  Application of reinforcement learning based on neural network to dynamic obstacle avoidance , 2008, 2008 International Conference on Information and Automation.

[18]  Martin P. Nawrot,et al.  Conditioned behavior in a robot controlled by a spiking neural network , 2013, 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER).

[19]  Wulfram Gerstner,et al.  Neuronal Dynamics: From Single Neurons To Networks And Models Of Cognition , 2014 .

[20]  Wulfram Gerstner,et al.  Spike-timing dependent plasticity , 2010, Scholarpedia.

[21]  Eric Nichols,et al.  Biologically Inspired SNN for Robot Control , 2013, IEEE Transactions on Cybernetics.