What to Read in a Contract? Party-Specific Summarization of Important Obligations, Entitlements, and Prohibitions in Legal Documents

Legal contracts, such as employment or lease agreements, are important documents as they govern the obligations and entitlements of the various contracting parties. However, these documents are typically long and written in legalese resulting in lots of manual hours spent in understanding them. In this paper, we address the task of summarizing legal contracts for each of the contracting parties, to enable faster reviewing and improved understanding of them. Specifically, we collect a dataset consisting of pairwise importance comparison annotations by legal experts for ∼ 293 K sentence pairs from lease agreements. We pro-pose a novel extractive summarization system to automatically produce a summary consisting of the most important obligations, entitle-ments, and prohibitions in a contract. It consists of two modules: (1) a content categorizer to identify sentences containing each of the categories ( i.e. , obligation , entitlement , and pro-hibition ) for a party, and (2) an importance ranker to compare the importance among sentences of each category for a party to obtain a ranked list. The final summary is produced by selecting the most important sentences of a category for each of the parties. We demonstrate the effectiveness of our proposed system by comparing it against several text ranking baselines via automatic and human evaluation.

[1]  Balaji Vasan Srinivasan,et al.  Agent-Specific Deontic Modality Detection in Legal Language , 2022, EMNLP.

[2]  Matthias Grabmair,et al.  Extractive Summarization of Legal Decisions using Multi-task Learning and Maximal Marginal Relevance , 2022, EMNLP.

[3]  Saif M. Mohammad,et al.  What Makes Sentences Semantically Related? A Textual Relatedness Dataset and Empirical Study , 2021, EACL.

[4]  A. Shahina,et al.  Summarization of Commercial Contracts , 2021, Machine Learning, IOT and Blockchain Technologies & Trends.

[5]  Kripabandhu Ghosh,et al.  Incorporating domain knowledge for extractive summarization of legal case documents , 2021, ICAIL.

[6]  S. Naidu,et al.  Unsupervised Extraction of Workplace Rights and Duties from Collective Bargaining Agreements , 2020, 2020 International Conference on Data Mining Workshops (ICDMW).

[7]  Evangelos Kanoulas,et al.  A Benchmark for Lease Contract Review , 2020, ArXiv.

[8]  Ion Androutsopoulos,et al.  LEGAL-BERT: “Preparing the Muppets for Court’” , 2020, FINDINGS.

[9]  Ryan McDonald,et al.  On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.

[10]  Shinsuke Mori,et al.  A Contract Corpus for Recognizing Rights and Obligations , 2020, LREC.

[11]  Mark Cieliebak,et al.  LEDGAR: A Large-Scale Multi-label Corpus for Text Classification of Legal Provisions in Contracts , 2020, LREC.

[12]  Teven Le Scao,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[13]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[14]  Lu Wang,et al.  BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization , 2019, ACL.

[15]  Mirella Lapata,et al.  Sentence Centrality Revisited for Unsupervised Summarization , 2019, ACL.

[16]  Junyi Jessy Li,et al.  Plain English Summarization of Contracts , 2019, Proceedings of the Natural Legal Language Processing Workshop 2019.

[17]  K. Suzanne Barber,et al.  PrivacyCheck , 2018, ACM Trans. Internet Techn..

[18]  Daniel Martin Katz,et al.  LexNLP: Natural Language Processing and Information Extraction For Legal and Regulatory Texts , 2018, Research Handbook on Big Data Law.

[19]  Benjamin Van Durme,et al.  Efficient Online Scalar Annotation with Bounded Support , 2018, ACL.

[20]  Balaji Vasan Srinivasan,et al.  Generating Topic-Oriented Summaries Using Neural Attention , 2018, NAACL.

[21]  Ion Androutsopoulos,et al.  Obligation and Prohibition Extraction Using Hierarchical RNNs , 2018, ACL.

[22]  Mor Naaman,et al.  Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies , 2018, NAACL.

[23]  Shinsaku Kiyomoto,et al.  PrivacyGuide: Towards an Implementation of the EU GDPR on Internet Privacy Policy Evaluation , 2018, IWSPA@CODASPY.

[24]  Mirella Lapata,et al.  Ranking Sentences for Extractive Summarization with Reinforcement Learning , 2018, NAACL.

[25]  Furu Wei,et al.  Faithful to the Original: Fact Aware Neural Abstractive Summarization , 2017, AAAI.

[26]  Saif Mohammad,et al.  Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation , 2017, ACL.

[27]  Paul Buitelaar,et al.  Classifying sentential modality in legal language: a use case in financial regulations, acts and directives , 2017, ICAIL.

[28]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[29]  Mirella Lapata,et al.  Neural Extractive Summarization with Side Information , 2017, ArXiv.

[30]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[31]  Aleksandra Matulewska,et al.  DEONTIC MODALITY AND MODALS IN THE LANGUAGE OF CONTRACTS , 2017 .

[32]  Ruihong Huang,et al.  CaseSummarizer: A System for Automated Summarization of Legal Texts , 2016, COLING.

[33]  Serena Villata,et al.  Combining NLP Approaches for Rule Extraction from Legal Documents , 2016 .

[34]  Bowen Zhou,et al.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[35]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[36]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[37]  Alexander M. Rush,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[38]  A. Marley,et al.  Best-worst scaling: theory and methods , 2014 .

[39]  Matt Post,et al.  Efficient Elicitation of Annotations for Human Evaluation of Machine Translation , 2014, WMT@ACL.

[40]  David B. Bracewell,et al.  The Author Perspective Model for Classifying Deontic Modality in Events , 2014, FLAIRS.

[41]  Guido Governatori,et al.  OASIS LegalRuleML , 2013, ICAIL.

[42]  Paul Compton,et al.  Combining Different Summarization Techniques for Legal Text , 2012 .

[43]  Ferda Nur Alpaslan,et al.  Text summarization using Latent Semantic Analysis , 2011, J. Inf. Sci..

[44]  Achim G. Hoffmann,et al.  LEXA: Towards Automatic Legal Citation Classification , 2010, Australasian Conference on Artificial Intelligence.

[45]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[46]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[47]  Robert G. Malkin,et al.  Automatic summarization of English broadcast news speech , 2002 .

[48]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[49]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[50]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[51]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[52]  M. W. Richardson,et al.  The theory of the estimation of test reliability , 1937 .

[53]  Moniba Keymanesh,et al.  Toward Domain-Guided Controllable Summarization of Privacy Policies , 2020, NLLP@KDD.

[54]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[55]  Wim Peters,et al.  Legal Text Interpretation: Identifying Hohfeldian Relations from Text , 2016, LREC.

[56]  Rico Sennrich,et al.  Controlling Politeness in Neural Machine Translation via Side Constraints , 2016, NAACL.

[57]  Min-Yen Kan,et al.  Overview of the CL-SciSumm 2016 Shared Task , 2016, BIRNDL@JCDL.

[58]  Wim Peters,et al.  On Rule Extraction from Regulations , 2011, JURIX.

[59]  B. Orme MaxDiff Analysis : Simple Counting , Individual-Level Logit , and HB , 2009 .

[60]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS , 1952 .