Encoding Spatial Relations from Natural Language

Natural language processing has made significant inroads into learning the semantics of words through distributional approaches, however representations learnt via these methods fail to capture certain kinds of information implicit in the real world. In particular, spatial relations are encoded in a way that is inconsistent with human spatial reasoning and lacking invariance to viewpoint changes. We present a system capable of capturing the semantics of spatial relations such as behind, left of, etc from natural language. Our key contributions are a novel multi-modal objective based on generating images of scenes from their textual descriptions, and a new dataset on which to train it. We demonstrate that internal representations are robust to meaning preserving transformations of descriptions (paraphrase invariance), while viewpoint invariance is an emergent property of the system.

[1]  Marcus Kracht,et al.  On the Semantics of Locatives , 2002 .

[2]  S. Kosslyn Seeing and imagining in the cerebral hemispheres: a computational approach. , 1987, Psychological review.

[3]  Marie-Francine Moens,et al.  Learning to interpret spatial natural language in terms of qualitative spatial relations , 2013 .

[4]  Thora Tenbrink,et al.  A linguistic ontology of space for natural language processing , 2010, Artif. Intell..

[5]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[6]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[7]  R. Wallace The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason , 1988 .

[8]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[9]  Won-Sook Lee,et al.  Visualizing Natural Language Descriptions , 2016, ACM Comput. Surv..

[10]  Xueying Zhang,et al.  Annotation of Spatial Relations in Natural Language , 2009, 2009 International Conference on Environmental Science and Information Application Technology.

[11]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[12]  Christian J. Rapold,et al.  Plasticity of human spatial cognition: Spatial language and cognition covary across cultures , 2011, Cognition.

[13]  Joshua B. Tenenbaum,et al.  Phrase similarity in humans and machines , 2015, CogSci.

[14]  B. Landau,et al.  “What” and “where” in spatial language and spatial cognition , 1993 .

[15]  Luc De Raedt,et al.  Relational Learning for Spatial Relation Extraction from Natural Language , 2011, ILP.

[16]  Richard Sproat,et al.  Collecting Spatial Information for Locations in a Text-to-Scene Conversion System , 2011 .

[17]  Alex Graves,et al.  Neural Machine Translation in Linear Time , 2016, ArXiv.

[18]  Gary Lupyan,et al.  Categorical Biases in Perceiving Spatial Relations , 2014, PloS one.

[19]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[20]  Angel X. Chang,et al.  Learning Spatial Knowledge for Text to 3D Scene Generation , 2014, EMNLP.

[21]  S. Kosslyn,et al.  Neural systems that encode categorical versus coordinate spatial relations: PET investigations , 1998, Psychobiology.