Representation Learning for Grounded Spatial Reasoning

The interpretation of spatial references is highly contextual, requiring joint inference over both language and the environment. We consider the task of spatial reasoning in a simulated environment, where an agent can act and receive rewards. The proposed model learns a representation of the world steered by instruction text. This design allows for precise alignment of local neighborhoods with corresponding verbalizations, while also handling global references in the instructions. We train our model with reinforcement learning using a variant of generalized value iteration. The model outperforms state-of-the-art approaches on several metrics, yielding a 45% reduction in goal localization error.

[1]  R. Bellman Dynamic programming. , 1957, Science.

[2]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  L. Gleitman,et al.  Turning the tables: language and spatial reasoning , 2002, Cognition.

[7]  Ruth,et al.  Spatial Reasoning , 2003 .

[8]  S. Levinson Space in language and cognition: Explorations in cognitive diversity , 2003 .

[9]  Marjorie Skubic,et al.  Spatial language for human-robot dialogs , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Reinhard Moratz,et al.  Spatial Reference in Linguistic Human-Robot Interaction: Iterative, Empirically Supported Development of a Model of Projective Relations , 2006, Spatial Cogn. Comput..

[11]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[12]  John D. Kelleher,et al.  Proximity in Context: An Empirically Grounded Computational Model of Proximity for Processing Topological Spatial Expressions , 2006, ACL.

[13]  T. Tenbrink Space, time, and the use of language : an investigation of relationships , 2007 .

[14]  Deb Roy,et al.  Interpretation of Spatial Language in a Map Navigation Task , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Robert Dale,et al.  The Use of Spatial Relations in Referring Expression Generation , 2008, INLG.

[16]  Daniel Jurafsky,et al.  Learning to Follow Navigational Directions , 2010, ACL.

[17]  L. Carlson,et al.  Spatial Reasoning , 2010 .

[18]  Stefanie Tellex,et al.  Toward understanding natural language directions , 2010, HRI 2010.

[19]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[20]  Regina Barzilay,et al.  Learning to Win by Reading Manuals in a Monte-Carlo Framework , 2011, ACL.

[21]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[22]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[23]  Matthew R. Walter,et al.  Learning Semantic Maps from Natural Language Descriptions , 2013, Robotics: Science and Systems.

[24]  Raymond J. Mooney,et al.  Adapting Discriminative Reranking to Grounded Language Learning , 2013, ACL.

[25]  Luke S. Zettlemoyer,et al.  Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[26]  Matthew R. Walter,et al.  Learning spatial-semantic representations from natural language descriptions and scene classifications , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[28]  Dan Klein,et al.  Alignment-Based Compositional Semantics for Instruction Following , 2015, EMNLP.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Wei Xu,et al.  ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering , 2015, ArXiv.

[31]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[32]  Daniel Marcu,et al.  Natural Language Communication with Robots , 2016, NAACL.

[33]  Shie Mannor,et al.  Iterative Hierarchical Optimization for Misspecified Problems (IHOMP) , 2016, ArXiv.

[34]  John Langford,et al.  Mapping Instructions and Visual Observations to Actions with Reinforcement Learning , 2017, EMNLP.