Recurrent neural network-based models for recognizing requisite and effectuation parts in legal texts

This paper proposes several recurrent neural network-based models for recognizing requisite and effectuation (RE) parts in Legal Texts. Firstly, we propose a modification of BiLSTM-CRF model that allows the use of external features to improve the performance of deep learning models in case large annotated corpora are not available. However, this model can only recognize RE parts which are not overlapped. Secondly, we propose two approaches for recognizing overlapping RE parts including the cascading approach which uses the sequence of BiLSTM-CRF models and the unified model approach with the multilayer BiLSTM-CRF model and the multilayer BiLSTM-MLP-CRF model. Experimental results on two Japan law RRE datasets demonstrated advantages of our proposed models. For the Japanese National Pension Law dataset, our approaches obtained an $$F_{1}$$F1 score of 93.27% and exhibited a significant improvement compared to previous approaches. For the Japan Civil Code RRE dataset which is written in English, our approaches produced an $$F_{1}$$F1 score of 78.24% in recognizing RE parts that exhibited a significant improvement over strong baselines. In addition, using external features and in-domain pre-trained word embeddings also improved the performance of RRE systems.

[1]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[2]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[3]  Yuji Matsumoto,et al.  Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.

[4]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[5]  Wei Xu,et al.  End-to-end learning of semantic role labeling using recurrent neural networks , 2015, ACL.

[6]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[7]  Akira Shimazu,et al.  Learning Logical Structures of Paragraphs in Legal Articles , 2011, IJCNLP.

[8]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Ravikumar Kondadadi,et al.  Named Entity Recognition and Resolution in Legal Text , 2010, Semantic Processing of Legal Texts.

[11]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[12]  Mikael Bodén,et al.  A guide to recurrent neural networks and backpropagation , 2001 .

[13]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[14]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[15]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[16]  Hai Zhao,et al.  A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding , 2015, ArXiv.

[17]  Nguyen Truong Son,et al.  Recognizing logical parts in Vietnamese legal texts using Conditional Random Fields , 2015, The 2015 IEEE RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF).

[18]  Akira Shimazu,et al.  Supervised and Semi-Supervised Sequence Learning for Recognition of Requisite Part and Effectuation Part in Law Sentences , 2011, FSMNLP.

[19]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 shared task , 2003 .

[20]  Yuji Matsumoto,et al.  Japanese Dependency Analysis using Cascaded Chunking , 2002, CoNLL.

[21]  Ramesh Nallapati,et al.  16 Legal Claim Identification : Information Extraction with Hierarchically Labeled Data , 2010 .

[22]  Kikuo Tanaka,et al.  Standard Structure of Legal Provisions -For The Legal Knowledge Processing by Natural Language- , 1993 .

[23]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[24]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Hai Zhao,et al.  Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network , 2015, ArXiv.

[26]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[27]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[28]  Akira Shimazu,et al.  Towards Translation of Legal Sentences into Logical Forms , 2007, JSAI.

[29]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[30]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[31]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[32]  Ramón Fernández Astudillo,et al.  Not All Contexts Are Created Equal: Better Word Representations with Variable Attention , 2015, EMNLP.

[33]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.