Threshold-Based Retrieval and Textual Entailment Detection on Legal Bar Exam Questions

Getting an overview over the legal domain has become challenging, especially in a broad, international context. Legal question answering systems have the potential to alleviate this task by automatically retrieving relevant legal texts for a specific statement and checking whether the meaning of the statement can be inferred from the found documents. We investigate a combination of the BM25 scoring method of Elasticsearch with word embeddings trained on English translations of the German and Japanese civil law. For this, we define criteria which select a dynamic number of relevant documents according to threshold scores. Exploiting two deep learning classifiers and their respective prediction bias with a threshold-based answer inclusion criterion has shown to be beneficial for the textual entailment task, when compared to the baseline.

[1]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[2]  Quang-Thuy Ha,et al.  Refining the Judgment Threshold to Improve Recognizing Textual Entailment Using Similarity , 2012, ICCCI.

[3]  Dmitry Yarotsky,et al.  Optimal approximation of continuous functions by very deep ReLU networks , 2018, COLT.

[4]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[5]  Benno Stein,et al.  Strategies for retrieving plagiarized documents , 2007, SIGIR.

[6]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[7]  Arnab Bhattacharya,et al.  Overview of the FIRE 2017 IRLeD Track: Information Retrieval from Legal Documents , 2017, FIRE.

[8]  Ansgar Scherp,et al.  Word Embeddings for Practical Information Retrieval , 2017, GI-Jahrestagung.

[9]  Gareth J. F. Jones,et al.  Challenges in the Development of Effective Systems for Professional Legal Search , 2018, ProfS/KG4IR/Data:Search@SIGIR.

[10]  Philip S. Taylor,et al.  An investigation into the application of ensemble learning for entailment classification , 2014, Inf. Process. Manag..

[11]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[12]  Yang Liu,et al.  Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention , 2016, ArXiv.

[13]  Randy Goebel,et al.  COLIEE-2018: Evaluation of the Competition on Case Law Information Extraction and Entailment , 2018 .

[14]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[15]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[16]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[17]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[18]  Livio Robaldo,et al.  Legal Information Retrieval Using Topic Clustering and Neural Networks , 2017, COLIEE@ICAIL.

[19]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[21]  Randy Goebel,et al.  COLIEE-2018: Evaluation of the Competition on Legal Information Extraction and Entailment , 2018, JSAI-isAI Workshops.

[22]  Ido Dagan,et al.  Web Based Probabilistic Textual Entailment , 2005 .

[23]  Danqi Chen,et al.  CoQA: A Conversational Question Answering Challenge , 2018, TACL.

[24]  Yoshinobu Kano,et al.  Overview of Japanese Statute Law Retrieval and Entailment Task at COLIEE-2018 , 2018 .

[25]  Mohit Bansal,et al.  Shortcut-Stacked Sentence Encoders for Multi-Domain Inference , 2017, RepEval@EMNLP.

[26]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[27]  Minh-Tien Nguyen,et al.  Lexical-Morphological Modeling for Legal Text Analysis , 2015, JSAI-isAI Workshops.

[28]  Bartosz Krawczyk,et al.  Leveraging Ensemble Pruning for Imbalanced Data Classification , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[29]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[30]  Johannes Schmidt-Hieber,et al.  Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.

[31]  Leilei Kong,et al.  HLJIT2017@IRLed-FIRE2017: Information Retrieval From Legal Documents , 2017, FIRE.

[32]  Minh-Tien Nguyen,et al.  Legal Question Answering using Ranking SVM and Deep Convolutional Neural Network , 2017, ArXiv.