Do Neural Language Representations Learn Physical Commonsense?

Humans understand language based on the rich background knowledge about how the physical world works, which in turn allows us to reason about the physical world through language. In addition to the properties of objects (e.g., boats require fuel) and their affordances, i.e., the actions that are applicable to them (e.g., boats can be driven), we can also reason about if-then inferences between what properties of objects imply the kind of actions that are applicable to them (e.g., that if we can drive something then it likely requires fuel). In this paper, we investigate the extent to which state-of-the-art neural language representations, trained on a vast amount of natural language text, demonstrate physical commonsense reasoning. While recent advancements of neural language models have demonstrated strong performance on various types of natural language inference tasks, our study based on a dataset of over 200k newly collected annotations suggests that neural language representations still only learn associations that are explicitly written down.

[1]  Ali Farhadi,et al.  Situation Recognition: Visual Semantic Role Labeling for Image Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  William W. Gaver Technology affordances , 1991, CHI.

[4]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[5]  Joshua B. Tenenbaum,et al.  Faulty Towers: A hypothetical simulation model of physical support , 2017, CogSci.

[6]  Marco Baroni,et al.  Human few-shot learning of compositional instructions , 2019, CogSci.

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Mark S. Seidenberg,et al.  Semantic feature production norms for a large set of living and nonliving things , 2005, Behavior research methods.

[9]  M. Gover The Embodied Mind: Cognitive Science and Human Experience (Book) , 1996 .

[10]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[11]  Margaret Wilson,et al.  Six views of embodied cognition , 2002, Psychonomic bulletin & review.

[12]  Jon Gauthier,et al.  Are Distributional Representations Ready for the Real World? Evaluating Word Vectors for Grounded Perceptual Meaning , 2017, RoboNLP@ACL.

[13]  Roy Schwartz,et al.  How Well Do Distributional Models Capture Different Types of Semantic Knowledge? , 2015, ACL.

[14]  Jeroen Geertzen,et al.  The Centre for Speech, Language and the Brain (CSLB) concept property norms , 2013, Behavior research methods.

[15]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[16]  Van Durme,et al.  Extracting implicit knowledge from text , 2009 .

[17]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[18]  D. Norman The Design of Everyday Things: Revised and Expanded Edition , 2013 .

[19]  From concepts to models : some issues in quantifying feature norms , 2015 .

[20]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[22]  J. Gibson The Senses Considered As Perceptual Systems , 1967 .

[23]  Hector J. Levesque,et al.  The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.