Can GPT-3 Perform Statutory Reasoning?

Statutory reasoning is the task of reasoning with facts and statutes, which are rules written in natural language by a legislature. It is a basic legal skill. In this paper we explore the capabilities of the most capable GPT-3 model, text-davinci-003, on an established statutory-reasoning dataset called SARA. We consider a variety of approaches, including dynamic few-shot prompting, chain-of-thought prompting, and zero-shot prompting. While we achieve results with GPT-3 that are better than the previous best published results, we also identify several types of clear errors it makes. We investigate why these errors happen. We discover that GPT-3 has imperfect prior knowledge of the actual U.S. statutes on which SARA is based. More importantly, we create simple synthetic statutes, which GPT-3 is guaranteed not to have seen during training. We find GPT-3 performs poorly at answering straightforward questions about these simple synthetic statutes.

[1]  Eric P. Xing,et al.  Improved Logical Reasoning of Language Models via Differentiable Symbolic Programming , 2023, ACL.

[2]  Kristin E. Hickman,et al.  ChatGPT Goes to Law School , 2023, SSRN Electronic Journal.

[3]  D. Katz,et al.  GPT Takes the Bar Exam , 2022, SSRN Electronic Journal.

[4]  Frank Schilder,et al.  Legal Prompting: Teaching a Language Model to Think Like a Lawyer , 2022, ArXiv.

[5]  G. Qi,et al.  Judicial knowledge-enhanced magnitude-aware reasoning for numerical legal judgment prediction , 2022, Artificial Intelligence and Law.

[6]  Ashish Sabharwal,et al.  Decomposed Prompting: A Modular Approach for Solving Complex Tasks , 2022, ICLR.

[7]  He He,et al.  Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought , 2022, ICLR.

[8]  Song-Chun Zhu,et al.  Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering , 2022, NeurIPS.

[9]  S. Gu,et al.  Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.

[10]  D. Schuurmans,et al.  Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , 2022, ICLR.

[11]  Noah D. Goodman,et al.  STaR: Bootstrapping Reasoning With Reasoning , 2022, 2203.14465.

[12]  Bernal Jimenez Gutierrez,et al.  Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again , 2022, EMNLP.

[13]  Masaharu Yoshioka,et al.  Overview and Discussion of the Competition on Legal Information Extraction/Entailment (COLIEE) 2021 , 2022, The Review of Socionetwork Strategies.

[14]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[15]  Kevin D. Ashley,et al.  Legal information retrieval for understanding statutory terms , 2021, Artif. Intell. Law.

[16]  Ariel Kruger,et al.  Optimizing Language Models for Argumentative Reasoning , 2022, ArgML@COMMA.

[17]  Kevin D. Ashley,et al.  Discovering Explanatory Sentences in Legal Case Decisions Using Pre-trained Language Models , 2021, EMNLP.

[18]  Benjamin Van Durme,et al.  Factoring Statutory Reasoning as Language Understanding Challenges , 2021, ACL.

[19]  Dawn Song,et al.  Measuring Massive Multitask Language Understanding , 2020, ICLR.

[20]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[21]  Benjamin Van Durme,et al.  A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering , 2020, NLLP@KDD.

[22]  Maosong Sun,et al.  Iteratively Questioning and Answering for Interpretable Legal Judgment Prediction , 2020, AAAI.

[23]  Farhana H. Zulkernine,et al.  The Gap between Deep Learning and Law: Predicting Employment Notice , 2020, NLLP@KDD.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[26]  Sarah B. Lawsky,et al.  A Logic for Statutes , 2017 .