Vision-Based Robot Navigation through Combining Unsupervised Learning and Hierarchical Reinforcement Learning

Extensive studies have shown that many animals’ capability of forming spatial representations for self-localization, path planning, and navigation relies on the functionalities of place and head-direction (HD) cells in the hippocampus. Although there are numerous hippocampal modeling approaches, only a few span the wide functionalities ranging from processing raw sensory signals to planning and action generation. This paper presents a vision-based navigation system that involves generating place and HD cells through learning from visual images, building topological maps based on learned cell representations and performing navigation using hierarchical reinforcement learning. First, place and HD cells are trained from sequences of visual stimuli in an unsupervised learning fashion. A modified Slow Feature Analysis (SFA) algorithm is proposed to learn different cell types in an intentional way by restricting their learning to separate phases of the spatial exploration. Then, to extract the encoded metric information from these unsupervised learning representations, a self-organized learning algorithm is adopted to learn over the emerged cell activities and to generate topological maps that reveal the topology of the environment and information about a robot’s head direction, respectively. This enables the robot to perform self-localization and orientation detection based on the generated maps. Finally, goal-directed navigation is performed using reinforcement learning in continuous state spaces which are represented by the population activities of place cells. In particular, considering that the topological map provides a natural hierarchical representation of the environment, hierarchical reinforcement learning (HRL) is used to exploit this hierarchy to accelerate learning. The HRL works on different spatial scales, where a high-level policy learns to select subgoals and a low-level policy learns over primitive actions to specialize on the selected subgoals. Experimental results demonstrate that our system is able to navigate a robot to the desired position effectively, and the HRL shows a much better learning performance than the standard RL in solving our navigation tasks.

[1]  Philippe Gaussier,et al.  Autonomous vision-based navigation: Goal-oriented action planning by transient states prediction, cognitive map building, and sensory-motor learning , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  M. Moser,et al.  Representation of Geometric Borders in the Entorhinal Cortex , 2008, Science.

[3]  Philippe Gaussier,et al.  Neurobiologically Inspired Mobile Robot Navigation and Planning , 2007, Frontiers in neurorobotics.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Yi Yang,et al.  Semantic Pooling for Complex Event Analysis in Untrimmed Videos , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[7]  Zhihui Li,et al.  Beyond Trace Ratio: Weighted Harmonic Mean of Trace Ratios for Multiclass Discriminant Analysis , 2017, IEEE Transactions on Knowledge and Data Engineering.

[8]  J. Taube,et al.  Firing Properties of Rat Lateral Mammillary Single Units: Head Direction, Head Pitch, and Angular Head Velocity , 1998, The Journal of Neuroscience.

[9]  Simon M Stringer,et al.  Entorhinal cortex grid cells can map to hippocampal place cells by competitive learning , 2006, Network.

[10]  Stephen R. Marsland,et al.  A self-organising network that grows when required , 2002, Neural Networks.

[11]  H. T. Blair,et al.  The anatomical and computational basis of the rat head-direction cell signal , 2001, Trends in Neurosciences.

[12]  A. Hughes,et al.  A schematic eye for the rat , 1979, Vision Research.

[13]  Gordon Wyeth,et al.  Persistent Navigation and Mapping using a Biologically Inspired SLAM System , 2010, Int. J. Robotics Res..

[14]  M. Quirk,et al.  Requirement for Hippocampal CA3 NMDA Receptors in Associative Memory Recall , 2002, Science.

[15]  R U Muller,et al.  Head-direction cells recorded from the postsubiculum in freely moving rats. I. Description and quantitative analysis , 1990, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[16]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[17]  Feiping Nie,et al.  Compound Rank- $k$ Projections for Bilinear Analysis , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Manfred Huber,et al.  Subgoal Discovery for Hierarchical Reinforcement Learning Using Learned Policies , 2003 .

[19]  G. Einevoll,et al.  From grid cells to place cells: A mathematical model , 2006, Hippocampus.

[20]  Stefan Wermter,et al.  Learning Localisation Based on Landmarks Using Self-Organisation , 2003, ICANN.

[21]  Emilio Kropff,et al.  Place cells, grid cells, and the brain's spatial representation system. , 2008, Annual review of neuroscience.

[22]  E Save,et al.  Evidence for a relationship between place‐cell spatial firing and spatial memory performance , 2001, Hippocampus.

[23]  International Foundation for Autonomous Agents and MultiAgent Systems ( IFAAMAS ) , 2007 .

[24]  Ricardo Chavarriaga,et al.  Robust self-localisation and navigation based on hippocampal place cells , 2005, Neural Networks.

[25]  Uğur M Erdem,et al.  A goal‐directed spatial navigation model using forward trajectory planning based on grid cells , 2012, The European journal of neuroscience.

[26]  E. J. Green,et al.  Head-direction cells in the rat posterior cortex , 1994, Experimental Brain Research.

[27]  Laurenz Wiskott,et al.  RatLab: an easy to use tool for place code simulations , 2013, Front. Comput. Neurosci..

[28]  Stefan Wermter,et al.  A Self-organizing Method for Robot Navigation based on Learned Place and Head-Direction Cells , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[29]  W E Skaggs,et al.  Interactions between location and task affect the spatial and directional firing of hippocampal neurons , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[30]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[31]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[32]  Angelo Arleo,et al.  Spatial cognition and neuro-mimetic navigation: a model of hippocampal place cell activity , 2000, Biological Cybernetics.

[33]  Simon Brodeur,et al.  HoME: a Household Multimodal Environment , 2017, ICLR.

[34]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[35]  Bruce L. McNaughton,et al.  Path integration and the neural basis of the 'cognitive map' , 2006, Nature Reviews Neuroscience.

[36]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[37]  L. Frank,et al.  Hippocampal and cortical place cell plasticity: Implications for episodic memory , 2006, Hippocampus.

[38]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[39]  Edmund T. Rolls,et al.  The mechanisms for pattern completion and pattern separation in the hippocampus , 2013, Front. Syst. Neurosci..

[40]  R. Passingham The hippocampus as a cognitive map J. O'Keefe & L. Nadel, Oxford University Press, Oxford (1978). 570 pp., £25.00 , 1979, Neuroscience.

[41]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[42]  K. Zhang,et al.  Representation of spatial orientation by the intrinsic dynamics of the head-direction cell ensemble: a theory , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[43]  K. Denbigh,et al.  Note on Entropy, Disorder and Disorganization1 , 1989, The British Journal for the Philosophy of Science.

[44]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[45]  H. Eichenbaum,et al.  Place cell activation predicts subsequent memory , 2013, Behavioural Brain Research.

[46]  Sebastian Thrun,et al.  Planning with an Adaptive World Model , 1990, NIPS.

[47]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[48]  Kam-Fai Wong,et al.  Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning , 2017, EMNLP.

[49]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[50]  Shalabh Bhatnagar,et al.  Universal Option Models , 2014, NIPS.

[51]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[52]  Balaraman Ravindran,et al.  Option Discovery in Hierarchical Reinforcement Learning using Spatio-Temporal Clustering , 2016, 1605.05359.

[53]  Laurenz Wiskott,et al.  Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View Cells , 2007, PLoS Comput. Biol..

[54]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[55]  Brad E. Pfeiffer,et al.  Hippocampal place cell sequences depict future paths to remembered goals , 2013, Nature.

[56]  Ricardo Chavarriaga,et al.  Spatial Representation and Navigation in a Bio-inspired Robot , 2005, Biomimetic Neural Learning for Intelligent Robots.

[57]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[58]  J. O'Keefe,et al.  The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. , 1971, Brain research.

[59]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[60]  Chong Wang,et al.  Subgoal Discovery for Hierarchical Dialogue Policy Learning , 2018, EMNLP.

[61]  D. Bilkey,et al.  The velocity‐related firing property of hippocampal place cells is dependent on self‐movement , 2009, Hippocampus.

[62]  Stefan Wermter,et al.  Robot Localization and Orientation Detection Based on Place Cells and Head-Direction Cells , 2017, ICANN.

[63]  C. Gallistel Animal cognition: the representation of space, time and number. , 1989, Annual review of psychology.

[64]  R. Muller,et al.  Head-direction cells recorded from the postsubiculum in freely moving rats. II. Effects of environmental manipulations , 1990, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[65]  Stefan Wermter,et al.  A Hybrid Planning Strategy Through Learning from Vision for Target-Directed Navigation , 2018, ICANN.

[66]  J. O’Keefe,et al.  Modeling place fields in terms of the cortical inputs to the hippocampus , 2000, Hippocampus.

[67]  Laurenz Wiskott,et al.  Modeling place field activity with hierarchical slow feature analysis , 2015, Front. Comput. Neurosci..

[68]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[69]  Ian Q Whishaw,et al.  Hippocampal lesions and path integration , 1997, Current Opinion in Neurobiology.

[70]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[71]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[72]  Andrew G. Barto,et al.  Skill Characterization Based on Betweenness , 2008, NIPS.

[73]  E N Brown,et al.  A Statistical Paradigm for Neural Spike Train Decoding Applied to Position Prediction from Ensemble Firing Patterns of Rat Hippocampal Place Cells , 1998, The Journal of Neuroscience.

[74]  Satinder P. Singh,et al.  Linear options , 2010, AAMAS.

[75]  Alejandra Barrera,et al.  Solving uncertainty during robot navigation by integrating grid cell and place cell firing based on rat spatial cognition studies , 2013, 2013 16th International Conference on Advanced Robotics (ICAR).