Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents

Legal artificial intelligence (LegalAI) aims to benefit legal systems with the technology of artificial intelligence, especially natural language processing (NLP). Recently, inspired by the success of pre-trained language models (PLMs) in the generic domain, many LegalAI researchers devote their effort to apply PLMs to legal tasks. However, utilizing PLMs to address legal tasks is still challenging, as the legal documents usually consist of thousands of tokens, which is far longer than the length that mainstream PLMs can process. In this paper, we release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding. We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering. The experimental results demonstrate that our model can achieve promising improvement on tasks with long documents as inputs.

[1]  Xin Zhou,et al.  Legal Summarization for Multi-role Debate Dialogue via Controversy Focus Mining and Multi-task Learning , 2019, CIKM.

[2]  Ion Androutsopoulos,et al.  Neural Legal Judgment Prediction in English , 2019, ACL.

[3]  Luo Si,et al.  De-biased Court’s View Generation with Causality , 2020, EMNLP.

[4]  Omer Levy,et al.  Blockwise Self-Attention for Long Document Understanding , 2020, EMNLP.

[5]  Fred Kort Predicting Supreme Court Decisions Mathematically: A Quantitative Analysis of the “Right to Counsel” Cases , 1957, American Political Science Review.

[6]  Wanxiang Che,et al.  Pre-Training with Whole Word Masking for Chinese BERT , 2019, ArXiv.

[7]  V. Balakista Reddy,et al.  Analyzing the Extraction of Relevant Legal Judgments using Paragraph-level and Citation Information , 2016 .

[8]  B. Jafarpour,et al.  Customizing Contextualized Language Models for Legal Document Reviews , 2020, 2020 IEEE International Conference on Big Data (Big Data).

[9]  Zhiyuan Liu,et al.  Legal Judgment Prediction via Topological Learning , 2018, EMNLP.

[10]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[11]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Ngo Xuan Bach,et al.  Answering Legal Questions by Learning Neural Attentive Text Representation , 2020, COLING.

[14]  David A. Moore,et al.  BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding , 2019, ArXiv.

[15]  Li Yang,et al.  Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[16]  Edouard Grave,et al.  Adaptive Attention Span in Transformers , 2019, ACL.

[17]  Weijia Jia,et al.  Legal Judgment Prediction via Multi-Perspective Bi-Feedback Network , 2019, IJCAI.

[18]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[19]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[20]  Zhiyuan Liu,et al.  CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction , 2018, ArXiv.

[21]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[22]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[23]  Dan Hendrycks,et al.  CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review , 2021, NeurIPS Datasets and Benchmarks.

[24]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[25]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[26]  Wei-Hung Weng,et al.  Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[27]  Maosong Sun,et al.  How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence , 2020, ACL.

[28]  Yiqun Liu,et al.  BERT-PLI: Modeling Paragraph-Level Interactions for Legal Case Retrieval , 2020, IJCAI.

[29]  Yiqun Liu,et al.  LeCaRD: A Legal Case Retrieval Dataset for Chinese Law System , 2021, SIGIR.

[30]  J. Segal Predicting Supreme Court Cases Probabilistically: The Search and Seizure Cases, 1962-1981 , 1984, American Political Science Review.

[31]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[32]  Maosong Sun,et al.  JEC-QA: A Legal-Domain Question Answering Dataset , 2019, AAAI.

[33]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[34]  Xin Jiang,et al.  Interpretable Charge Predictions for Criminal Cases: Learning to Generate Court Views from Fact Descriptions , 2018, NAACL.

[35]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[36]  Yi-Hung Liu,et al.  A text mining approach to assist the general public in the retrieval of legal documents , 2013, J. Assoc. Inf. Sci. Technol..

[37]  Dongyan Zhao,et al.  Learning to Predict Charges for Criminal Cases with Legal Basis , 2017, EMNLP.

[38]  Zhiyuan Liu,et al.  CJRC: A Reliable Human-Annotated Benchmark DataSet for Chinese Judicial Reading Comprehension , 2019, CCL.