A Comparative Study of Classifying Legal Documents with Neural Networks

In recent years, deep learning has shown promising results when used in the field of natural language processing (NLP). Neural networks (NNs) such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been used for various NLP tasks including sentiment analysis, information retrieval, and document classification. In this paper, we the present the Supreme Court Classifier (SCC), a system that applies these methods to the problem of document classification of legal court opinions. We compare methods using traditional machine learning with recent NN-based methods. We also present a CNN used with pre-trained word vectors which shows improvements over the state-of-the-art applied to our dataset. We train and evaluate our system using the Washington University School of Law Supreme Court Database (SCDB). Our best system (word2vec + CNN) achieves 72.4% accuracy when classifying the court decisions into 15 broad SCDB categories and 31.9% accuracy when classifying among 279 finer-grained SCDB categories.

[1]  Yann LeCun,et al.  Very Deep Convolutional Networks for Natural Language Processing , 2016, ArXiv.

[2]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[7]  Christopher D. Manning,et al.  Legal Docket-Entry Classification : Where Machine Learning stumbles , 2008 .

[8]  Corey W. Arnold,et al.  Source-LDA: Enhancing Probabilistic Topic Models Using Prior Knowledge Sources , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[9]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[10]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[11]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[15]  Kavishwar B. Wagholikar,et al.  Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach , 2017, BMC Medical Informatics and Decision Making.

[16]  Wenpeng Yin,et al.  Comparative Study of CNN and RNN for Natural Language Processing , 2017, ArXiv.

[17]  Spyros Kotoulas,et al.  Medical Text Classification using Convolutional Neural Networks , 2017, Studies in health technology and informatics.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[20]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[22]  Andrew Y. Ng,et al.  Transfer learning for text classification , 2005, NIPS.

[23]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[24]  Ramesh Nallapati,et al.  Legal Docket Classification: Where Machine Learning Stumbles , 2008, EMNLP.

[25]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[26]  Josef van Genabith,et al.  Exploring the Use of Text Classification in the Legal Domain , 2017, ASAIL@ICAIL.