论文信息 - Vector-based navigation using grid-like representations in artificial agents - 字舞流文

Vector-based navigation using grid-like representations in artificial agents

Deep neural networks have achieved impressive successes in fields ranging from object recognition to complex games such as Go1,2. Navigation, however, remains a substantial challenge for artificial agents, with deep neural networks trained by reinforcement learning3–5 failing to rival the proficiency of mammalian spatial behaviour, which is underpinned by grid cells in the entorhinal cortex6. Grid cells are thought to provide a multi-scale periodic representation that functions as a metric for coding space7,8 and is critical for integrating self-motion (path integration)6,7,9 and planning direct trajectories to goals (vector-based navigation)7,10,11. Here we set out to leverage the computational functions of grid cells to develop a deep reinforcement learning agent with mammal-like navigational abilities. We first trained a recurrent network to perform path integration, leading to the emergence of representations resembling grid cells, as well as other entorhinal cell types12. We then showed that this representation provided an effective basis for an agent to locate goals in challenging, unfamiliar, and changeable environments—optimizing the primary objective of navigation through deep reinforcement learning. The performance of agents endowed with grid-like representations surpassed that of an expert human and comparison agents, with the metric quantities necessary for vector-based navigation derived from grid-like units within the network. Furthermore, grid-like representations enabled agents to conduct shortcut behaviours reminiscent of those performed by mammals. Our findings show that emergent grid-like representations furnish agents with a Euclidean spatial metric and associated vector operations, providing a foundation for proficient navigation. As such, our results support neuroscientific theories that see grid cells as critical for vector-based navigation7,10,11, demonstrating that the latter can be combined with path-based strategies to support navigation in challenging environments.Grid-like representations emerge spontaneously within a neural network trained to self-localize, enabling the agent to take shortcuts to destinations using vector-based navigation.

Razvan Pascanu | Raia Hadsell | Fabio Viola | Demis Hassabis | Koray Kavukcuoglu | Martin J. Chadwick | Charles Blundell | Benigno Uria | Alexander Pritzel | Caswell Barry | Dharshan Kumaran | Timothy Lillicrap | Greg Wayne | Thomas Degris | Andrea Banino | Brian Zhang | Joseph Modayil | Stig Petersen | Hubert Soyer | Ross Goroshin | Piotr Mirowski | Neil C. Rabinowitz | Amir Sadik | Neil Rabinowitz | Charlie Beattie | Stephen Gaffney | Helen King | T. Lillicrap | K. Kavukcuoglu | D. Hassabis | R. Hadsell | Greg Wayne | Stig Petersen | Charlie Beattie | A. Sadik | Helen King | D. Kumaran | Joseph Modayil | T. Degris | C. Blundell | A. Pritzel | Razvan Pascanu | Hubert Soyer | P. Mirowski | B. Uria | Andrea Banino | C. Barry | M. Chadwick | Fabio Viola | Brian Zhang | Ross Goroshin | Stephen Gaffney | Piotr Wojciech Mirowski | Amir Sadik | Benigno Uria

[1] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[2] S. Beucher. Use of watersheds in contour detection , 1979 .

[3] Geoffrey E. Hinton,et al. A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[4] John S. Bridle,et al. Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[5] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[6] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[7] D S Touretzky,et al. Theory of rodent navigation based on interacting representations of space , 1996, Hippocampus.

[8] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[9] Olejnik,et al. Measures of Effect Size for Comparative Studies: Applications, Interpretations, and Limitations. , 2000, Contemporary educational psychology.

[10] David J. Foster,et al. A model of hippocampally dependent navigation, using the temporal difference learning rule , 2000, Hippocampus.

[11] Hugh F. Durrant-Whyte,et al. A solution to the simultaneous localization and map building (SLAM) problem , 2001, IEEE Trans. Robotics Autom..

[12] J. Bassett,et al. Neural Correlates for Angular Head Velocity in the Rat Dorsal Tegmental Nucleus , 2001, The Journal of Neuroscience.

[13] H. Mittelstaedt,et al. Homing by path integration in a mammal , 1980, Naturwissenschaften.

[14] T. Hafting,et al. Microstructure of a spatial map in the entorhinal cortex , 2005, Nature.

[15] Torkel Hafting,et al. Conjunctive Representation of Position, Direction, and Velocity in Entorhinal Cortex , 2006, Science.

[16] Bruce L. McNaughton,et al. Path integration and the neural basis of the 'cognitive map' , 2006, Nature Reviews Neuroscience.

[17] Mark C. Fuhs,et al. A Spin Glass Model of Path Integration in Rat Medial Entorhinal Cortex , 2006, The Journal of Neuroscience.

[18] J. O’Keefe,et al. An oscillatory interference model of grid cell firing , 2007, Hippocampus.

[19] Lisa M. Giocomo,et al. Grid cell firing may arise from interference of theta frequency membrane potential oscillations in single neurons , 2007, Hippocampus.

[20] K. Jeffery,et al. Experience-dependent rescaling of entorhinal grids , 2007, Nature Neuroscience.

[21] Gordon Wyeth,et al. Mapping a Suburb With a Single Camera Using a Biologically Inspired SLAM System , 2008, IEEE Transactions on Robotics.

[22] M. Moser,et al. Representation of Geometric Borders in the Entorhinal Cortex , 2008, Science.

[23] Ila R Fiete,et al. What Grid Cells Convey about Rat Location , 2008, The Journal of Neuroscience.

[24] Yoram Burakyy,et al. Accurate Path Integration in Continuous Attractor Network Models of Grid Cells , 2009 .

[25] Christian F. Doeller,et al. Evidence for grid cells in a human memory network , 2010, Nature.

[26] Edvard I Moser,et al. Development of the Spatial Representation System in the Rat , 2010, Science.

[27] Thomas J. Wills,et al. Development of the Hippocampal Cognitive Map in Preweanling Rats , 2010, Science.

[28] Nathaniel D. Daw,et al. Grid Cells, Place Cells, and Geodesic Generalization for Spatial Reinforcement Learning , 2011, PLoS Comput. Biol..

[29] M. Yartsev,et al. Grid cells without theta oscillations in the entorhinal cortex of bats , 2011, Nature.

[30] Martin Stemmler,et al. Optimal Population Codes for Space: Grid Cells Outperform Place Cells , 2012, Neural Computation.

[31] May-Britt Moser,et al. The entorhinal grid map is discretized , 2012, Nature.

[32] André A. Fenton,et al. Linear Look-Ahead in Conjunctive Cells: An Entorhinal Mechanism for Vector-Based Navigation , 2012, Front. Neural Circuits.

[33] Michael E. Hasselmo,et al. Modeling Boundary Vector Cell Firing Given Optic Flow as a Cue , 2012, PLoS Comput. Biol..

[34] John A. King,et al. How vision and movement combine in the hippocampal place code , 2012, Proceedings of the National Academy of Sciences.

[35] Uğur M Erdem,et al. A goal‐directed spatial navigation model using forward trajectory planning based on grid cells , 2012, The European journal of neuroscience.

[36] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[37] M. Moser,et al. Optogenetic Dissection of Entorhinal-Hippocampal Functional Connectivity , 2013, Science.

[38] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[39] Ila R. Fiete,et al. How does the brain solve the computational problems of spatial navigation , 2014 .

[40] K. Jeffery,et al. Weighted cue integration in the rodent head direction system , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[41] M. Botvinick,et al. Supplemental Information for Design Principles of the Hippocampal Cognitive Map , 2014 .

[42] C. Barry,et al. Neural Mechanisms of Self-Location , 2014, Current Biology.

[43] Alexander Mathis,et al. Connecting multiple spatial scales to decode the population activity of grid cells , 2015, Science Advances.

[44] Yoshua Bengio,et al. Towards Biologically Plausible Deep Learning , 2015, ArXiv.

[45] D. Hassabis,et al. A Goal Direction Signal in the Human Entorhinal/Subicular Region , 2015, Current Biology.

[46] Surya Ganguli,et al. Environmental Boundaries as an Error Correction Mechanism for Grid Cells , 2015, Neuron.

[47] Neil Burgess,et al. Using Grid Cells for Navigation , 2015, Neuron.

[48] D. Curran‐Everett,et al. The fickle P value generates irreproducible results , 2015, Nature Methods.

[49] Edvard I. Moser,et al. Speed cells in the medial entorhinal cortex , 2015, Nature.

[50] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[51] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[52] Ron Meir,et al. Extracting grid cell characteristics from place cell inputs using non-negative principal component analysis , 2016, eLife.

[53] Konrad P. Körding,et al. Toward an Integration of Deep Learning and Neuroscience , 2016, bioRxiv.

[54] Honglak Lee,et al. Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[55] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[56] Yoshua Bengio,et al. Towards a Biologically Plausible Backprop , 2016, ArXiv.

[57] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[58] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[59] Shane Legg,et al. DeepMind Lab , 2016, ArXiv.

[60] C. Barry,et al. To be a Grid Cell: Shuffling procedures for determining “Gridness” , 2017, bioRxiv.

[61] Nachum Ulanovsky,et al. Vectorial representation of spatial goals in the hippocampus of bats , 2017, Science.

[62] Christopher Joseph Pal,et al. Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks , 2017, ArXiv.

[63] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.