European Court of Human Right Open Data project

This paper presents thirteen datasets for binary, multiclass and multilabel classification based on the European Court of Human Rights judgments since its creation. The interest of such datasets is explained through the prism of the researcher, the data scientist, the citizen and the legal practitioner. Contrarily to many datasets, the creation process, from the collection of raw data to the feature transformation, is provided under the form of a collection of fully automated and open-source scripts. It ensures reproducibility and a high level of confidence in the processed data, which is some of the most important issues in data governance nowadays. A first experimental campaign is performed to study some predictability properties and to establish baseline results on popular machine learning algorithms. The results are consistently good across the binary datasets with an accuracy comprised between 75.86% and 98.32% for an average accuracy of 96.45%.

[1]  Petr Sojka,et al.  Software Framework for Topic Modelling , 2010 .

[2]  Joachim M. Buhmann,et al.  The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.

[3]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[4]  Alexandre Quemy,et al.  Data Pipeline Selection and Optimization , 2019, DOLAP.

[5]  D. Katz,et al.  A general approach for predicting the behavior of the Supreme Court of the United States , 2016, PloS one.

[6]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Andrew D. Martin,et al.  The Supreme Court Forecasting Project: Legal and Political Science Approaches to Predicting Supreme Court Decisionmaking , 2004 .

[8]  Sven F. Crone,et al.  The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing , 2006, Eur. J. Oper. Res..

[9]  Jakub Nalepa,et al.  Genetic Selection of Training Sets for (Not Only) Artificial Neural Networks , 2018, BDAS.

[10]  Paul P. Tallon Corporate Governance of Big Data: Perspectives on Value, Risk, and Cost , 2013, Computer.

[11]  Noah A. Smith,et al.  The Utility of Text: The Case of Amicus Briefs and the Supreme Court , 2014, AAAI.

[12]  Tom S. Clark,et al.  Scaling Politically Meaningful Dimensions Using Texts and Votes , 2014 .

[13]  Edwina L. Rissland,et al.  AI and Similarity , 2006, IEEE Intelligent Systems.

[14]  Jennifer C Molloy,et al.  The Open Knowledge Foundation: Open Data Means Better Science , 2011, PLoS biology.

[15]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[16]  R. Guimerà,et al.  Justice Blocks and Predictability of U.S. Supreme Court Votes , 2011, PloS one.

[17]  Gianluigi Viscusi,et al.  Is Open Data Enough?: E-Governance Challenges for Open Government , 2014, Int. J. Electron. Gov. Res..

[18]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[19]  P. Wolfe,et al.  The future of statistics and data science , 2018 .

[20]  A. Karr Exploratory Data Mining and Data Cleaning , 2006 .

[21]  Stephen T. C. Wong,et al.  Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods , 2017, Cancer.

[22]  Michael Gurstein,et al.  Open data: Empowering the empowered or effective data use for everyone? , 2011, First Monday.

[23]  L. Cohen,et al.  Solving the Chevron Puzzle , 1994 .

[24]  Davide Chicco,et al.  Ten quick tips for machine learning in computational biology , 2017, BioData Mining.

[25]  Nazri Mohd Nawi,et al.  The Effect of Data Pre-processing on Optimized Training of Artificial Neural Networks , 2013 .

[26]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[27]  Alexandre Quemy,et al.  Data Science Techniques for Law and Justice: Current State of Research and Open Problems , 2017, ADBIS.

[28]  Mohammad Raihanul Islam,et al.  Inferring Multi-Dimensional Ideal Points for US Supreme Court Justices , 2016, AAAI.

[29]  S. Dallari How judges think , 2011 .

[30]  J. Segal,et al.  Ideological Values and the Votes of U.S. Supreme Court Justices , 1989 .

[31]  Andrew D. Martin,et al.  Competing Approaches to Predicting Supreme Court Decision Making , 2004, Perspectives on Politics.

[32]  Brian Leiter,et al.  LEGAL FORMALISM AND LEGAL REALISM: WHAT IS THE ISSUE? , 2010, Legal Theory.

[33]  Daniel Klerman The Selection of 13th‐Century Disputes for Litigation , 2012 .

[34]  Lee Epstein,et al.  Ideological Values and the Votes of U.S. Supreme Court Justices Revisited , 1989, The Journal of Politics.

[35]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[36]  Yannis Charalabidis,et al.  Benefits, Adoption Barriers and Myths of Open Data and Open Government , 2012, Inf. Syst. Manag..

[37]  Nikolaos Aletras,et al.  Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective , 2016, PeerJ Comput. Sci..

[38]  Maxat Kassen,et al.  A promising phenomenon of open data: A case study of the Chicago open data project , 2013, Gov. Inf. Q..

[39]  P Tiwari,et al.  Computer-Extracted Texture Features to Distinguish Cerebral Radionecrosis from Recurrent Brain Tumors on Multiparametric MRI: A Feasibility Study , 2016, American Journal of Neuroradiology.

[40]  David Stuart,et al.  The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences , 2015, Online Inf. Rev..

[41]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[42]  Rob Brennan,et al.  Challenges in Value-Driven Data Governance , 2018, OTM Conferences.

[43]  Andrew D. Martin,et al.  Dynamic Ideal Point Estimation via Markov Chain Monte Carlo for the U.S. Supreme Court, 1953–1999 , 2002, Political Analysis.

[44]  Ce Zhang,et al.  Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features , 2016, Nature Communications.

[45]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..