Building Legal Case Retrieval Systems with Lexical Matching and Summarization using A Pre-Trained Phrase Scoring Model

We present our method for tackling the legal case retrieval task of the Competition on Legal Information Extraction/Entailment 2019. Our approach is based on the idea that summarization is important for retrieval. On one hand, we adopt a summarization based model called encoded summarization which encodes a given document into continuous vector space which embeds the summary properties of the document. We utilize the resource of COLIEE 2018 on which we train the document representation model. On the other hand, we extract lexical features on different parts of a given query and its candidates. We observe that by comparing different parts of the query and its candidates, we can achieve better performance. Furthermore, the combination of the lexical features with latent features by the summarization-based method achieves even better performance. We have achieved the state-of-the-art result for the task on the benchmark of the competition.

[1]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[2]  Ruili Wang,et al.  Knowledge Representation for the Intelligent Legal Case Retrieval , 2005, KES.

[3]  Minh Le Nguyen,et al.  Automatic Catchphrase Extraction from Legal Case Documents via Scoring using Deep Neural Networks , 2018, ArXiv.

[4]  Rinke Hoekstra,et al.  A legal case OWL ontology with an instantiation of Popov v. Hayashi , 2012, Artificial Intelligence and Law.

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  Mirella Lapata,et al.  Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.

[7]  Anatoly P. Getman,et al.  A crowdsourcing approach to building a legal ontology from text , 2014, Artificial Intelligence and Law.

[8]  Kenny Q. Zhu,et al.  Controlling Length in Abstractive Summarization Using a Convolutional Neural Network , 2018, EMNLP.

[9]  Randy Goebel,et al.  COLIEE-2018: Evaluation of the Competition on Legal Information Extraction and Entailment , 2018, JSAI-isAI Workshops.

[10]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[11]  Trevor J. M. Bench-Capon,et al.  A history of AI and Law in 50 papers: 25 years of the international conference on AI and Law , 2012, Artificial Intelligence and Law.

[12]  Khalid Al-Kofahi,et al.  Information extraction from case law and retrieval of prior cases , 2003, Artif. Intell..

[13]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[14]  Kripabandhu Ghosh,et al.  Measuring Similarity among Legal Court Case Documents , 2017, Compute '17.

[15]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[16]  M. Saravanan,et al.  Improving legal information retrieval using an ontological framework , 2009, Artificial Intelligence and Law.

[17]  Adam Zachary Wyner,et al.  An ontology in OWL for legal case-based reasoning , 2008, Artificial Intelligence and Law.

[18]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[19]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[20]  Rui Yan,et al.  Natural Language Inference by Tree-Based Convolution and Heuristic Matching , 2015, ACL.

[21]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[22]  Tong Zhang,et al.  Effective Use of Word Order for Text Categorization with Convolutional Neural Networks , 2014, NAACL.

[23]  Yann Dauphin,et al.  A Convolutional Encoder Model for Neural Machine Translation , 2016, ACL.

[24]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..