BERT-PLI: Modeling Paragraph-Level Interactions for Legal Case Retrieval

Legal case retrieval is a specialized IR task that involves retrieving supporting cases given a query case. Compared with traditional ad-hoc text retrieval, the legal case retrieval task is more challenging since the query case is much longer and more complex than common keyword queries. Besides that, the definition of relevance between a query case and a supporting case is beyond general topical relevance and it is therefore difficult to construct a large-scale case retrieval dataset, especially one with accurate relevance judgments. To address these challenges, we propose BERT-PLI, a novel model that utilizes BERT to capture the semantic relationships at the paragraph-level and then infers the relevance between two cases by aggregating paragraph-level interactions. We finetune the BERT model with a relatively small-scale case law entailment dataset to adapt it to the legal scenario and employ a cascade framework to reduce the computational cost. We conduct extensive experiments on the benchmark of the relevant case retrieval task in COLIEE 2019. Experimental results demonstrate that our proposed method outperforms existing solutions.

[1]  Xueqi Cheng,et al.  Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN , 2016, IJCAI.

[2]  Prasenjit Majumder,et al.  FIRE 2019 AILA Track: Artificial Intelligence for Legal Assistance , 2019, FIRE.

[3]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[5]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[6]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[7]  Minh Le Nguyen,et al.  Building Legal Case Retrieval Systems with Lexical Matching and Summarization using A Pre-Trained Phrase Scoring Model , 2019, ICAIL.

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[10]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Burkhard Schafer,et al.  Concept and Context in Legal Information Retrieval , 2008, JURIX.

[13]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[14]  Trevor J. M. Bench-Capon,et al.  A history of AI and Law in 50 papers: 25 years of the international conference on AI and Law , 2012, Artificial Intelligence and Law.

[15]  Cristiana Santos,et al.  On the concept of relevance in legal information retrieval , 2017, Artificial Intelligence and Law.

[16]  Howard R. Turtle Text retrieval in the legal world , 1995, Artificial Intelligence and Law.

[17]  Jamie Callan,et al.  Deeper Text Understanding for IR with Contextual Neural Language Modeling , 2019, SIGIR.

[18]  Jimmy J. Lin,et al.  Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval , 2019, EMNLP.

[19]  Xiang Ji,et al.  MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching , 2019, SIGIR.

[20]  David D. Lewis,et al.  Information retrieval for e-discovery , 2010, SIGIR.

[21]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[22]  Hang Li,et al.  Deep Learning for Matching in Search and Recommendation , 2018, SIGIR.