Points, Paths, and Playscapes: Large-scale Spatial Language Understanding Tasks Set in the Real World

Spatial language understanding is important for practical applications and as a building block for better abstract language understanding. Much progress has been made through work on understanding spatial relations and values in images and texts as well as on giving and following navigation instructions in restricted domains. We argue that the next big advances in spatial language understanding can be best supported by creating largescale datasets that focus on points and paths based in the real world, and then extending these to create online, persistent playscapes that mix human and bot players, where the bot players must learn, evolve, and survive according to their depth of understanding of scenes, navigation, and interactions.

[1]  John Langford,et al.  Mapping Instructions and Visual Observations to Actions with Reinforcement Learning , 2017, EMNLP.

[2]  Jason Baldridge,et al.  Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles , 2015, AAAI.

[3]  Linne Ha,et al.  Community-Driven Crowdsourcing: Data Collection with Local Developers , 2018, LREC.

[4]  Lukasz Kaiser,et al.  Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.

[5]  Scott Nesbit,et al.  Creating a Novel Geolocation Corpus from Historical Texts , 2016, LAW@ACL.

[6]  B. McNaughton,et al.  Spatial representation in the hippocampal formation: a history , 2017, Nature Neuroscience.

[7]  Raia Hadsell,et al.  Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[8]  Dan Klein,et al.  Unified Pragmatic Models for Generating and Following Instructions , 2017, NAACL.

[9]  Alan L. Yuille,et al.  Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Marc Hanheide,et al.  Robot task planning and explanation in open and uncertain worlds , 2017, Artif. Intell..

[11]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[12]  Vicente Ordonez,et al.  ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.

[13]  Srinivas Narayanan,et al.  Moving Right Along: A Computational Model of Metaphoric Reasoning about Events , 1999, AAAI/IAAI.

[14]  Trevor Darrell,et al.  Modeling Relationships in Referential Expressions with Compositional Modular Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Werner Kuhn,et al.  A Model of Spatial Reference Frames in Language , 2011, COSIT.

[16]  Daniel Marcu,et al.  Learning Interpretable Spatial Operations in a Rich 3D Blocks World , 2017, AAAI.

[17]  Luke S. Zettlemoyer,et al.  Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[18]  Regina Barzilay,et al.  Representation Learning for Grounded Spatial Reasoning , 2017, TACL.

[19]  Daniel Jurafsky,et al.  Learning to Follow Navigational Directions , 2010, ACL.

[20]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[21]  Srini Narayanan,et al.  Spatial and Linguistic Aspects of Visual Imagery in Sentence Comprehension , 2007, Cogn. Sci..

[22]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Peter Stone,et al.  Learning to Interpret Natural Language Commands through Human-Robot Dialog , 2015, IJCAI.

[24]  Luc Van Gool,et al.  Introduction to Large-Scale Visual Geo-localization , 2016, Large-Scale Visual Geo-Localization.

[25]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[26]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.

[27]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Chaitanya Swamy,et al.  Orienteering Algorithms for Generating Travel Itineraries , 2018, WSDM.

[29]  Andrew Bennett,et al.  CHALET: Cornell House Agent Learning Environment , 2018, ArXiv.

[30]  Dan Klein,et al.  Where is Misty? Interpreting Spatial Descriptors by Modeling Regions in Space , 2017, EMNLP.

[31]  N. Burgess,et al.  The Cognitive Architecture of Spatial Navigation: Hippocampal and Striatal Contributions , 2015, Neuron.

[32]  Hao Tan,et al.  Source-Target Inference Models for Spatial Instruction Understanding , 2017, AAAI.

[33]  Philip David Smart,et al.  Interpreting spatial language in image captions , 2011, Cognitive Processing.

[34]  Timothy Baldwin,et al.  A Neural Model for User Geolocation and Lexical Dialectology , 2017, ACL.

[35]  Russell Lee-Goldman,et al.  Linguistic Wisdom from the Crowd , 2016, AAAI 2016.

[36]  Jason Baldridge,et al.  Hierarchical Discriminative Classification for Text-Based Geolocation , 2014, EMNLP.

[37]  Johan Bos,et al.  Combining Lexical and Spatial Knowledge to Predict Spatial Relations between Objects in Images , 2016, VL@ACL.

[38]  James Pustejovsky,et al.  ISO-Space: Annotating Static and Dynamic Spatial Information , 2017 .

[39]  James Pustejovsky,et al.  SemEval-2015 Task 8: SpaceEval , 2015, *SEMEVAL.

[40]  Hadas Kress-Gazit,et al.  Contextual awareness: Understanding monologic natural language instructions for autonomous robots , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[41]  S. Levinson Space in language and cognition: Explorations in cognitive diversity , 2003 .