Does it Make Sense? And Why? A Pilot Study for Sense Making and Explanation

Introducing common sense to natural language understanding systems has received increasing research attention. It remains a fundamental question on how to evaluate whether a system has the sense-making capability. Existing benchmarks measure common sense knowledge indirectly or without reasoning. In this paper, we release a benchmark to directly test whether a system can differentiate natural language statements that make sense from those that do not make sense. In addition, a system is asked to identify the most crucial reason why a statement does not make sense. We evaluate models trained over large-scale language modeling tasks as well as human performance, showing that there are different challenges for system sense-making.

[1]  Yejin Choi,et al.  Verb Physics: Relative Physical Knowledge of Actions and Objects , 2017, ACL.

[2]  James Allen,et al.  Tackling the Story Ending Biases in The Story Cloze Test , 2018, ACL.

[3]  Nathanael Chambers,et al.  A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.

[4]  Yejin Choi,et al.  SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[5]  Ernest Davis,et al.  How to Write Science Questions that Are Easy for People and Hard for Computers , 2016, AI Mag..

[6]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[8]  Ernest Davis,et al.  Logical Formalizations of Commonsense Reasoning: A Survey , 2017, J. Artif. Intell. Res..

[9]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[10]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[11]  Catherine Havasi,et al.  ConceptNet 5: A Large Semantic Network for Relational Knowledge , 2013, The People's Web Meets NLP.

[12]  Simon Ostermann,et al.  SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge , 2018, *SEMEVAL.

[13]  Katrin Erk,et al.  Modeling Semantic Plausibility by Injecting World Knowledge , 2018, NAACL.

[14]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[15]  Zornitsa Kozareva,et al.  SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.

[16]  Sheng Zhang,et al.  Ordinal Common-sense Inference , 2016, TACL.

[17]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[18]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Catherine Havasi,et al.  ConceptNet 3 : a Flexible , Multilingual Semantic Network for Common Sense Knowledge , 2007 .

[20]  Oren Etzioni,et al.  Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.

[21]  Yejin Choi,et al.  Event2Mind: Commonsense Inference on Events, Intents, and Reactions , 2018, ACL.

[22]  Leora Morgenstern,et al.  The Winograd Schema Challenge: Evaluating Progress in Commonsense Reasoning , 2015, AAAI.

[23]  Yejin Choi,et al.  The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task , 2017, CoNLL.

[24]  Simon Ostermann,et al.  MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge , 2018, LREC.

[25]  Yejin Choi,et al.  ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.

[26]  Hector J. Levesque,et al.  The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[27]  Peter Clark,et al.  Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , 2018, EMNLP.