Do Language Models Have Common Sense

It has been argued that current machine learning models do not have common sense, and therefore must be hard-coded with prior knowledge (Marcus, 2018). Here we show surprising evidence that language models can already learn to capture certain common sense knowledge. Our key observation is that a language model can compute the probability of any statement, and this probability can be used to evaluate the truthfulness of that statement. On the Winograd Schema Challenge (Levesque et al., 2011), language models are 11% higher in accuracy than previous state-of-the-art supervised methods. Language models can also be fine-tuned for the task of Mining Commonsense Knowledge on ConceptNet to achieve an F1 score of 0.912 and 0.824, outperforming previous best results (Jastrzebski et al., 2018). Further analysis demonstrates that language models can discover unique features of Winograd Schema contexts that decide the correct answers without explicit supervision.

[1]  Rico Sennrich,et al.  Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.

[2]  Jackie Chi Kit Cheung,et al.  Commonsense mining as knowledge base completion? A study on the impact of novelty , 2018, ArXiv.

[3]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[4]  Sebastian Ruder,et al.  Fine-tuned Language Models for Text Classification , 2018, ArXiv.

[5]  Gary Marcus,et al.  Deep Learning: A Critical Appraisal , 2018, ArXiv.

[6]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[7]  Yejin Choi,et al.  The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task , 2017, CoNLL.

[8]  Yu Hu,et al.  Combing Context and Commonsense Knowledge Through Neural Networks for Solving Winograd Schema Problems , 2017, AAAI Spring Symposia.

[9]  Quoc V. Le,et al.  Unsupervised Pretraining for Sequence to Sequence Learning , 2016, EMNLP.

[10]  Hai Wang,et al.  Broad Context Language Modeling as Reading Comprehension , 2016, EACL.

[11]  Xiang Li,et al.  Commonsense Knowledge Base Completion , 2016, ACL.

[12]  Sandro Pezzelle,et al.  The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.

[13]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[14]  Nathanael Chambers,et al.  A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.

[15]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[16]  Dan Roth,et al.  Solving Hard Coreference Problems , 2019, NAACL.

[17]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[18]  Ernest Davis,et al.  Commonsense reasoning and commonsense knowledge in artificial intelligence , 2015, Commun. ACM.

[19]  Chitta Baral,et al.  Towards Addressing the Winograd Schema Challenge - Building and Using a Semantic Parser and a Knowledge Hunting Module , 2015, IJCAI.

[20]  Yuliya Lierler,et al.  The Winograd Schema Challenge and Reasoning about Correlation , 2015, AAAI Spring Symposia.

[21]  Christopher D. Manning,et al.  NaturalLI: Natural Logic Inference for Common Sense Reasoning , 2014, EMNLP.

[22]  Peter Schüller,et al.  Tackling Winograd Schemas by Formalizing Relevance Theory in Knowledge Graphs , 2014, KR.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[25]  Vincent Ng,et al.  Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge , 2012, EMNLP.

[26]  Hector J. Levesque,et al.  The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[27]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[28]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[29]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[30]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[31]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.