Exploring the Use of Text Classification in the Legal Domain

In this paper, we investigate the application of text classification methods to support law professionals. We present several experiments applying machine learning techniques to predict with high accuracy the ruling of the French Supreme Court and the law area to which a case belongs to. We also investigate the influence of the time period in which a ruling was made on the form of the case description and the extent to which we need to mask information in a full case ruling to automatically obtain training and test data that resembles case descriptions. We developed a mean probability ensemble system combining the output of multiple SVM classifiers. We report results of 98% average F1 score in predicting a case ruling, 96% F1 score for predicting the law area of a case, and 87.07% F1 score on estimating the date of a ruling.

[1]  Josef van Genabith,et al.  Predicting the Law Area and Decisions of French Supreme Court Cases , 2017, RANLP.

[2]  Josh Blackman,et al.  Predicting the Behavior of the Supreme Court of the United States: A General Approach , 2014, ArXiv.

[3]  Gregory J. Park,et al.  Predicting Dark Triad Personality Traits from Twitter Usage and a Linguistic Analysis of Tweets , 2012, 2012 11th International Conference on Machine Learning and Applications.

[4]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[5]  Shervin Malmasi,et al.  Predicting Post Severity in Mental Health Forums , 2016, CLPsych@HLT-NAACL.

[6]  Shervin Malmasi,et al.  LTG at SemEval-2016 Task 11: Complex Word Identification with Classifier Ensembles , 2016, *SEMEVAL.

[7]  Shervin Malmasi,et al.  Modeling Language Change in Historical Corpora: The Case of Portuguese , 2016, LREC.

[8]  Marie-Francine Moens,et al.  Argumentation mining: the detection, classification and structure of arguments in text , 2009, ICAIL.

[9]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Paul Compton,et al.  Combining Different Summarization Techniques for Legal Text , 2012 .

[11]  Liviu P. Dinu,et al.  Pastiche Detection Based on Stopword Rankings. Exposing Impersonators of a Romanian Writer , 2012 .

[12]  Carlo Strapparava,et al.  SemEval 2015, Task 7: Diachronic Text Evaluation , 2015, *SEMEVAL.

[13]  George M. Mohay,et al.  Mining e-mail content for author identification forensics , 2001, SGMD.

[14]  Guy Lapalme,et al.  Legal Text Summarization by Exploration of the Thematic Structure and Argumentative Roles , 2004 .

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Nikolaos Aletras,et al.  Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective , 2016, PeerJ Comput. Sci..

[17]  Teresa Gonçalves,et al.  Evaluating preprocessing techniques in a Text Classification problem , 2005 .

[18]  Nasser Ghasem-Aghaee,et al.  Computational Modeling of Uncertainty Avoidance in Consumer Behavior , 2011 .

[19]  Verónica Pérez-Rosas,et al.  Experiments in Open Domain Deception Detection , 2015, EMNLP.

[20]  Alberto Barrón-Cedeño,et al.  Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection , 2013, CL.

[21]  Yang Xiang,et al.  Chinese Grammatical Error Diagnosis Using Ensemble Learning , 2015, NLP-TEA@ACL/IJCNLP.

[22]  Claire Grover,et al.  Extractive summarisation of legal texts , 2006, Artificial Intelligence and Law.

[23]  Diego Klabjan,et al.  Predicting litigation likelihood and time to litigation for patents , 2016, ICAIL.

[24]  Liviu P. Dinu,et al.  Temporal Text Ranking and Automatic Dating of Texts , 2014, EACL.

[25]  Jure Leskovec,et al.  Antisocial Behavior in Online Discussion Communities , 2015, ICWSM.

[26]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[27]  Preslav Nakov,et al.  Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task , 2016, VarDial@COLING.