论文信息 - ESPRIT: Explaining Solutions to Physical Reasoning Tasks

ESPRIT: Explaining Solutions to Physical Reasoning Tasks

Neural networks lack the ability to reason about qualitative physics and so cannot generalize to scenarios and tasks unseen during training. We propose ESPRIT, a framework for commonsense reasoning about qualitative physics in natural language that generates interpretable descriptions of physical events. We use a two-step approach of first identifying the pivotal physical events in an environment and then generating natural language descriptions of those events using a data-to-text approach. Our framework learns to generate explanations of how the physical simulation will causally evolve so that an agent or a human can easily reason about a solution using those interpretable descriptions. Human evaluations indicate that ESPRIT produces crucial fine-grained details and has high coverage of physical concepts compared to even human annotations. Dataset, code and documentation are available at this https URL.

[1] Alexander M. Rush,et al. End-to-End Content and Plan Selection for Data-to-Text Generation , 2018, INLG.

[2] Chris Mellish,et al. Optimising text quality in generation from relational databases , 2000, INLG.

[3] Regina Barzilay,et al. Rationalizing Neural Predictions , 2016, EMNLP.

[4] Wenhu Chen,et al. Logical Natural Language Generation from Open-Domain Tables , 2020, ACL.

[5] Matthew R. Walter,et al. What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment , 2015, NAACL.

[6] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[7] M. McCloskey,et al. Intuitive physics: the straight-down belief and its origin. , 1983, Journal of experimental psychology. Learning, memory, and cognition.

[8] Emmanuel Dupoux,et al. IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning , 2018, ArXiv.

[9] David Grangier,et al. Neural Text Generation from Structured Data with Application to the Biography Domain , 2016, EMNLP.

[10] Alexander M. Rush,et al. Challenges in Data-to-Document Generation , 2017, EMNLP.

[11] Liheng Chen,et al. Triple-to-Text: Converting RDF Triples into High-Quality Natural Languages via Optimizing an Inverse KL Divergence , 2019, SIGIR.

[12] Ross B. Girshick,et al. PHYRE: A New Benchmark for Physical Reasoning , 2019, NeurIPS.

[13] Thomas Lukasiewicz,et al. e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[14] Edward Grefenstette,et al. RTFM: Generalising to Novel Environment Dynamics via Reading , 2020, ICLR.

[15] Dan Klein,et al. Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[16] David Vandyke,et al. Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[17] Richard Socher,et al. Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.

[18] Shaohua Yang,et al. Language to Action: Towards Interactive Task Learning with Physical Agents , 2018, IJCAI.

[19] Ehud Reiter,et al. Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[20] Snigdha Chaturvedi,et al. Bridging the Structural Gap Between Encoding and Decoding for Data-To-Text Generation , 2020, ACL.

[21] Kôiti Hasida,et al. Reactive Content Selection in the Generation of Real-time Soccer Commentary , 1998, COLING-ACL.

[22] Zhifang Sui,et al. Table-to-text Generation by Structure-aware Seq2seq Learning , 2017, AAAI.

[23] Yejin Choi,et al. Do Neural Language Representations Learn Physical Commonsense? , 2019, CogSci.

[24] Sergey Levine,et al. Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[25] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[26] Yoshua Bengio,et al. Generalizable Features From Unsupervised Learning , 2016, ICLR.

[27] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[28] Chuang Gan,et al. The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision , 2019, ICLR.

[29] Yejin Choi,et al. Verb Physics: Relative Physical Knowledge of Actions and Objects , 2017, ACL.

[30] M. McCloskey,et al. Naive physics: the curvilinear impetus principle and its role in interactions with moving objects. , 1983, Journal of experimental psychology. Learning, memory, and cognition.

[31] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[32] Andrea Vedaldi,et al. ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking , 2018, ECCV.

[33] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[34] Mirella Lapata,et al. Bootstrapping Generators from Noisy Data , 2018, NAACL.

[35] Jiajun Wu,et al. Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[36] Jiajun Wu,et al. A Comparative Evaluation of Approximate Probabilistic Simulation and Deep Neural Networks as Accounts of Human Physical Scene Understanding , 2016, CogSci.

[37] Mario Fritz,et al. To Fall Or Not To Fall: A Visual Approach to Physical Stability Prediction , 2016, ArXiv.

[38] Raymond J. Mooney,et al. Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[39] Kenneth D. Forbus. Qualitative physics: past present and future , 1988 .

[40] Luyao Chen,et al. CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases , 2019, EMNLP.

[41] Pascal Poupart,et al. Order-Planning Neural Text Generation From Structured Data , 2017, AAAI.

[42] Yusuke Miyao,et al. Learning to Select, Track, and Generate for Data-to-Text , 2019, ACL.

[43] Emiel Krahmer,et al. Neural data-to-text generation: A comparison between pipeline and end-to-end architectures , 2019, EMNLP.

[44] Yoav Artzi,et al. A Corpus of Natural Language for Visual Reasoning , 2017, ACL.

[45] Hang Li,et al. “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[46] Wei Wang,et al. GTR-LSTM: A Triple Encoder for Sentence Generation from RDF Data , 2018, ACL.

[47] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[48] Prasoon Goyal,et al. Using Natural Language for Reward Shaping in Reinforcement Learning , 2019, IJCAI.

[49] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[50] Yejin Choi,et al. PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.

[51] Chuang Gan,et al. CLEVRER: CoLlision Events for Video REpresentation and Reasoning , 2020, ICLR.

[52] Zhiyu Chen,et al. Few-shot NLG with Pre-trained Language Model , 2020, ACL.

[53] Alvin Cheung,et al. Summarizing Source Code using a Neural Attention Model , 2016, ACL.

[54] Jessica B. Hamrick,et al. Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[55] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[56] Mirella Lapata,et al. Data-to-Text Generation with Content Selection and Planning , 2018, AAAI.

[57] Hannes Schulz,et al. Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation , 2017, ArXiv.

[58] Sergey Levine,et al. Reasoning About Physical Interactions with Object-Oriented Prediction and Planning , 2018, ICLR.

[59] Bowen Zhou,et al. Pointing the Unknown Words , 2016, ACL.

[60] Mario Fritz,et al. Visual Stability Prediction and Its Application to Manipulation , 2016, AAAI Spring Symposia.

[61] Rob Fergus,et al. Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[62] Byron C. Wallace,et al. ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2020, ACL.

[63] Hiroya Takamura,et al. Generating Live Soccer-Match Commentary from Play Data , 2019, AAAI.

[64] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65] Mirella Lapata,et al. Unsupervised Concept-to-text Generation with Hypergraphs , 2012, NAACL.

[66] Mirella Lapata,et al. Data-to-text Generation with Entity Modeling , 2019, ACL.