A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering

Legislation can be viewed as a body of prescriptive rules expressed in natural language. The application of legislation to facts of a case we refer to as statutory reasoning, where those facts are also expressed in natural language. Computational statutory reasoning is distinct from most existing work in machine reading, in that much of the information needed for deciding a case is declared exactly once (a law), while the information needed in much of machine reading tends to be learned through distributional language statistics. To investigate the performance of natural language understanding approaches on statutory reasoning, we introduce a dataset, together with a legal-domain text corpus. Straightforward application of machine reading models exhibits low out-of-the-box performance on our questions, whether or not they have been fine-tuned to the legal domain. We contrast this with a hand-constructed Prolog-based system, designed to fully solve the task. These experiments support a discussion of the challenges facing statutory reasoning moving forward, which we argue is an interesting real-world task that can motivate the development of models able to utilize prescriptive rules specified in natural language.

[1]  Stephen Pulman,et al.  Using the Framework , 1996 .

[2]  Leonie Kohl The Concept Of Law , 2016 .

[3]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[4]  Kevin Duh,et al.  Broad-Coverage Semantic Parsing as Transduction , 2019, EMNLP/IJCNLP.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Roy Bar-Haim,et al.  The Second PASCAL Recognising Textual Entailment Challenge , 2006 .

[7]  Steffen Staab,et al.  Project Halo: Towards a Digital Aristotle , 2004, AI Mag..

[8]  Randy Goebel,et al.  Statute Law Information Retrieval and Entailment , 2019, ICAIL.

[9]  Ion Androutsopoulos,et al.  Obligation and Prohibition Extraction Using Hierarchical RNNs , 2018, ACL.

[10]  R. Hursthouse THE LOGIC OF DECISION AND ACTION , 1969 .

[11]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[12]  R S LEDLEY,et al.  Reasoning foundations of medical diagnosis; symbolic logic, probability, and value theory aid our understanding of how physicians reason. , 1959, Science.

[13]  Ion Androutsopoulos,et al.  A Deep Learning Approach to Contract Element Extraction , 2017, JURIX.

[14]  Le Minh Nguyen,et al.  ConvAMR: Abstract meaning representation parsing for legal document , 2017, ArXiv.

[15]  Gabriel Stanovsky,et al.  DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.

[16]  Trevor J. M. Bench-Capon,et al.  Logic programming for large scale applications in law: A formalisation of supplementary benefit legislation , 1987, ICAIL '87.

[17]  Kevin D. Ashley,et al.  AI and Law: A fruitful synergy , 2003, Artif. Intell..

[18]  Terence Parsons,et al.  Events in the Semantics of English: A Study in Subatomic Semantics , 1990 .

[19]  Zornitsa Kozareva,et al.  SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.

[20]  Edward H. Shortliffe,et al.  A model of inexact reasoning in medicine , 1990 .

[21]  Peter Clark,et al.  Project Halo Update - Progress Toward Digital Aristotle , 2010, AI Mag..

[22]  Kevin D. Ashley,et al.  Automatically classifying case texts and predicting outcomes , 2009, Artificial Intelligence and Law.

[23]  R. Ledley,et al.  Reasoning foundations of medical diagnosis. , 1991, M.D. computing : computers in medical practice.

[24]  Peter Clark,et al.  The Seventh PASCAL Recognizing Textual Entailment Challenge , 2011, TAC.

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  Kristina Yordanova,et al.  Towards Evaluating Plan Generation Approaches with Instructional Texts , 2020, ArXiv.

[27]  Oren Etzioni,et al.  From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project , 2019, AI Mag..

[28]  Walter Popp,et al.  JUDITH, A COMPUTER PROGRAM TO ADVISE LAWYERS IN REASONING A CASE , 2016 .

[29]  Randy Goebel,et al.  Combining Similarity and Transformer Methods for Case Law Entailment , 2019, ICAIL.

[30]  Anastassia Kornilova,et al.  BillSum: A Corpus for Automatic Summarization of US Legislation , 2019, EMNLP.

[31]  Regina Barzilay,et al.  Learning to Win by Reading Manuals in a Monte-Carlo Framework , 2011, ACL.

[32]  Guillaume Bouchard,et al.  Interpretation of Natural Language Rules in Conversational Machine Reading , 2018, EMNLP.

[33]  Ken Satoh,et al.  PROLEG: An Implementation of the Presupposed Ultimate Fact Theory of Japanese Civil Code by PROLOG Technology , 2010, JSAI-isAI Workshops.

[34]  Yoshinobu Kano,et al.  Overview of Japanese Statute Law Retrieval and Entailment Task at COLIEE-2018 , 2018 .

[35]  Lai Dac Viet,et al.  ConvAMR: Abstract meaning representation parsing , 2017 .

[36]  Norman Sadeh,et al.  Question Answering for Privacy Policies: Combining Computational and Legal Perspectives , 2019, EMNLP.

[37]  L. Thorne McCarty,et al.  Reflections on "Taxman": An Experiment in Artificial Intelligence and Legal Reasoning , 1977 .

[38]  Thiemo Wambsganss,et al.  Mining User-Generated Repair Instructions from Automotive Web Communities , 2019, HICSS.

[39]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[40]  Robert Hellawell,et al.  A Computer Program for Legal Planning and Analysis: Taxation of Stock Redemptions , 1980 .

[41]  Yejin Choi,et al.  SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[42]  H. Black,et al.  Black's Law Dictionary , 1968 .

[43]  Marek J. Sergot,et al.  The British Nationality Act as a logic program , 1986, CACM.

[44]  E. Feigenbaum Expert Systems : Principles and Practice * , 1992 .

[45]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[46]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[47]  H. E. Pople,et al.  Internist-I, an Experimental Computer-Based Diagnostic Consultant for General Internal Medicine , 1982 .

[48]  Ion Androutsopoulos,et al.  Large-Scale Multi-Label Text Classification on EU Legislation , 2019, ACL.

[49]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[50]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[51]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[52]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[53]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[54]  Kevin D. Ashley,et al.  Predicting outcomes of case based legal arguments , 2003, ICAIL.

[55]  Jean Paul Haton,et al.  Expert systems : principles and practice , 1988 .

[56]  D. M. Sherman A Prolog model of the income tax act of Canada , 1987, ICAIL '87.

[57]  Maosong Sun,et al.  JEC-QA: A Legal-Domain Question Answering Dataset , 2019, AAAI.

[58]  Muhammad Sher,et al.  Conversion of Legal Text to a Logical Rules Set from Medical Law Using the Medical Relational Model and the World Rule Model for a Medical Decision Support System , 2016, Informatics.

[59]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[60]  Claudia Soria,et al.  Automatic semantics extraction in law documents , 2005, ICAIL '05.

[61]  Wachara Fungwacharakorn,et al.  Legal Debugging in Propositional Legal Representation , 2018, JSAI-isAI Workshops.

[62]  Jerrold Soh Tsin Howe,et al.  Legal Area Classification: A Comparative Study of Text Classifiers on Singapore Supreme Court Judgments , 2019, Proceedings of the Natural Legal Language Processing Workshop 2019.

[63]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[64]  Ido Dagan,et al.  The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.

[65]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[66]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[67]  Ziqi Zhang,et al.  Automatically Extracting Procedural Knowledge from Instructional Texts using Natural Language Processing , 2012, LREC.

[68]  Anne v. d. L. Gardner,et al.  The Design of a Legal Analysis Program , 1983, AAAI.