论文信息 - AR-LSAT: Investigating Analytical Reasoning of Text - 字舞流文

AR-LSAT: Investigating Analytical Reasoning of Text

Analytical reasoning is an essential and challenging task that requires a system to analyze a scenario involving a set of particular circumstances and perform reasoning over it to make conclusions. In this paper, we study the challenge of analytical reasoning of text and introduce a new dataset consisting of questions from the Law School Admission Test from 1991 to 2016. We analyze what knowledge understanding and reasoning abilities are required to do well on this task. Furthermore, to address this reasoning challenge, we design two different baselines: (1) a Transformer-based method which leverages the state-of-the-art pre-trained language models and (2) Analytical Reasoning Machine (ARM), a logical-level reasoning framework extracting symbolic knowledge (e.g, participants, facts, logical functions) to deduce legitimate solutions. In our experiments, we find that the Transformer-based models struggle to solve this task as their performance is close to random guess and ARM achieves better performance by leveraging symbolic knowledge and interpretable reasoning steps. Results show that both methods still lag far behind human performance, which leave further space for future research. 1

Nan Duan | Jiahai Wang | Jian Yin | Duyu Tang | Daya Guo | Wanjun Zhong | Ming Zhou | Siyuan Wang | Zenan Xu | Ming Zhou | Duyu Tang | Jian Yin | Jiahai Wang | Nan Duan | Siyuan Wang | Daya Guo | Wanjun Zhong | Zenan Xu

[1] Oren Etzioni,et al. Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions , 2016, AAAI.

[2] Peter Clark,et al. SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[3] Gabriel Stanovsky,et al. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.

[4] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[6] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7] Nan Duan,et al. Syntax-Enhanced Pre-trained Model , 2020, ACL.

[8] Yejin Choi,et al. Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning , 2019, EMNLP.

[9] Bruce Wright,et al. Thinking theta and alpha: Mechanisms of intuitive and analytical reasoning , 2019, NeuroImage.

[10] Xiaodong Liu,et al. ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension , 2018, ArXiv.

[11] Yue Zhang,et al. Natural Language Inference in Context - Investigating Contextual Reasoning over Long Texts , 2020, AAAI.

[12] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[13] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[14] Sebastian Riedel,et al. Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.

[15] Jonathan Berant,et al. oLMpics-On What Language Model Pre-training Captures , 2019, Transactions of the Association for Computational Linguistics.

[16] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[17] Jiashi Feng,et al. ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning , 2020, ICLR.

[18] Jonathan Berant,et al. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[19] Chandra Bhagavatula,et al. Semi-supervised sequence tagging with bidirectional language models , 2017, ACL.

[20] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[21] Pushmeet Kohli,et al. Analysing Mathematical Reasoning Abilities of Neural Models , 2019, ICLR.

[22] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[23] Ali Farhadi,et al. HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[24] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[25] Jason Weston,et al. Dialogue Natural Language Inference , 2018, ACL.

[26] Jonathan Berant,et al. The Web as a Knowledge-Base for Answering Complex Questions , 2018, NAACL.

[27] Doug Downey,et al. Abductive Commonsense Reasoning , 2019, ICLR.

[28] Simon Ostermann,et al. SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge , 2018, *SEMEVAL.

[29] Oren Etzioni,et al. Learning to Solve Arithmetic Word Problems with Verb Categorization , 2014, EMNLP.

[30] Dan Roth,et al. “Going on a vacation” takes longer than “Going for a walk”: A Study of Temporal Commonsense Understanding , 2019, EMNLP.

[31] Luke S. Zettlemoyer,et al. Learning to Automatically Solve Algebra Word Problems , 2014, ACL.

[32] Wang Ling,et al. Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems , 2017, ACL.

[33] Yejin Choi,et al. Event2Mind: Commonsense Inference on Events, Intents, and Reactions , 2018, ACL.

[34] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[35] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[36] Chen Liang,et al. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision , 2016, ACL.

[37] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[38] Mohit Bansal,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.

[39] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[40] Hanmeng Liu,et al. LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning , 2020, IJCAI.

[41] Oren Etzioni,et al. Parsing Algebraic Word Problems into Equations , 2015, TACL.