Identification of Judicial Outcomes in Judgments: A Generalized Gini-PLS Approach

This paper presents and compares several text classification models that can be used to extract the outcome of a judgment from justice decisions, i.e., legal documents summarizing the different rulings made by a judge. Such models can be used to gather important statistics about cases, e.g., success rate based on specific characteristics of cases’ parties or jurisdiction, and are therefore important for the development of Judicial prediction not to mention the study of Law enforcement in general. We propose in particular the generalized Gini-PLS which better considers the information in the distribution tails while attenuating, as in the simple Gini-PLS, the influence exerted by outliers. Modeling the studied task as a supervised binary classification, we also introduce the LOGIT-Gini-PLS suited to the explanation of a binary target variable. In addition, various technical aspects regarding the evaluated text classification approaches which consists of combinations of representations of judgments and classification algorithms are studied using an annotated corpora of French justice decisions.

[1]  Charles N. Kroll,et al.  Impact of multicollinearity on small sample hydrologic regression models , 2013 .

[2]  L. A. Marascuilo Large-sample multiple comparisons. , 1966, Psychological bulletin.

[3]  Stéphane Mussard,et al.  Gini-PLS Regressions , 2019 .

[4]  Ingram Olkin,et al.  Gini Regression Analysis , 1992 .

[5]  William S. Rayens,et al.  PLS and dimension reduction for classification , 2007, Comput. Stat..

[6]  Alain Lacroux,et al.  Les avantages et les limites de la méthode « Partial Least Square » (PLS) : une illustration empirique dans le domaine de la GRH , 2011 .

[7]  김도완,et al.  Text Classification for Patents: Experiments with Unigrams, Bigrams and Different Weighting Methods , 2017 .

[8]  Sophie Lambert-Lacroix,et al.  Classification based on extensions of LS-PLS using logistic regression: application to clinical and multiple genomic data , 2018, BMC Bioinformatics.

[9]  Lin Ye,et al.  RnRTD: Intelligent Approach Based on the Relationship-Driven Neural Network and Restricted Tensor Decomposition for Multiple Accusation Judgment in Legal Cases , 2019, Comput. Intell. Neurosci..

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  Edna Schechtman,et al.  A Family of Correlation Coefficients Based on the Extended Gini Index , 2003 .

[12]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[13]  Jian Su,et al.  Supervised and Traditional Term Weighting Methods for Automatic Text Categorization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Franck Picard,et al.  High dimensional classification with combined adaptive sparse PLS and logistic regression , 2015, Bioinform..

[15]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[16]  Gildas Tagny Ngompé,et al.  Detecting Sections and Entities in Court Decisions Using HMM and CRF Graphical Models , 2017, EGC.