In Pursuit of Interpretable, Fair and Accurate Machine Learning for Criminal Recidivism Prediction

In recent years, academics and investigative journalists have criticized certain commercial risk assessments for their black-box nature and failure to satisfy competing notions of fairness. Since then, the field of interpretable machine learning has created simple yet effective algorithms, while the field of fair machine learning has proposed various mathematical definitions of fairness. However, studies from these fields are largely independent, despite the fact that many applications of machine learning to social issues require both fairness and interpretability. We explore the intersection by revisiting the recidivism prediction problem using state-of-the-art tools from interpretable machine learning, and assessing the models for performance, interpretability, and fairness. Unlike previous works, we compare against two existing risk assessments (COMPAS and the Arnold Public Safety Assessment) and train models that output probabilities rather than binary predictions. We present multiple models that beat these risk assessments in performance, and provide a fairness analysis of these models. Our results imply that machine learning models should be trained separately for separate locations, and updated over time.

[1]  Cynthia Rudin,et al.  Learning Cost-Effective and Interpretable Treatment Regimes , 2017, AISTATS.

[2]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[3]  M. Kearns,et al.  Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[4]  R. Berk,et al.  Developing a Practical Forecasting Screener for Domestic Violence Incidents , 2004, Evaluation review.

[5]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[6]  H. Hart Predicting Parole Success , 1923 .

[7]  L. Motiuk The Statistical Information on Recidivism - Revised 1 (SIR-R1) Scale: A Psychometric Examination , 2002 .

[8]  Seth Neel,et al.  A Convex Framework for Fair Regression , 2017, ArXiv.

[9]  Johannes Gehrke,et al.  Accurate intelligible models with pairwise interactions , 2013, KDD.

[10]  Miroslav Dudík,et al.  Fair Regression: Quantitative Definitions and Reduction-based Algorithms , 2019, ICML.

[11]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[12]  M. Wolfgang,et al.  Delinquency in a birth cohort , 1972 .

[13]  T. Brennan,et al.  Evaluating the Predictive Validity of the Compas Risk and Needs Assessment System , 2009 .

[14]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[15]  A. Blumstein The Crime Drop in America: an Exploration of Some Recent Crime Trends , 2006 .

[16]  Jon M. Kleinberg,et al.  On Fairness and Calibration , 2017, NIPS.

[17]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[18]  S. Bushway,et al.  The Inextricable Link Between Age and Criminal History in Sentencing , 2007 .

[19]  A. Mishra Climate and Crime , 2015 .

[20]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[21]  Katherine Freeman Algorithmic Injustice: How the Wisconsin Supreme Court Failed to Protect Due Process Rights in State v. Loomis , 2016 .

[22]  Kristen M. Zgoba,et al.  Predicting Recidivism in Homicide Offenders Using Classification Tree Analysis , 2011 .

[23]  Brian Onieal Model Penal Code , 2017 .

[24]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[25]  Fred L. Cheesman,et al.  Using Risk Assessment to Inform Sentencing Decisions for Nonviolent Offenders in Virginia , 2007 .

[26]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[27]  P. Howard,et al.  OGRS 3:the revised Offender Group Reconviction Scale , 2009 .

[28]  Cynthia Rudin,et al.  Learning Optimized Risk Scores , 2016, J. Mach. Learn. Res..

[29]  R. Dawes,et al.  Heuristics and Biases: Clinical versus Actuarial Judgment , 2002 .

[30]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[31]  V. Vapnik,et al.  A note one class of perceptrons , 1964 .

[32]  Cynthia Rudin,et al.  Interpretable classification models for recidivism prediction , 2015, 1503.07810.

[33]  D. Kehl,et al.  Algorithms in the Criminal Justice System: Assessing the Use of Risk Assessments in Sentencing , 2017 .

[34]  COMPAS Risk Scales : Demonstrating Accuracy Equity and Predictive Parity Performance of the COMPAS Risk Scales in Broward County , 2016 .

[35]  Ben Matthews,et al.  Rethinking one of criminology’s ‘brute facts’: The age–crime curve and the crime drop in Scotland , 2017, European journal of criminology.

[36]  Justin M. Rao,et al.  Precinct or Prejudice? Understanding Racial Disparities in New York City's Stop-and-Frisk Policy , 2016 .

[37]  Richard A. Berk,et al.  Nested conformal prediction sets for classification with applications to probation data , 2021, The Annals of Applied Statistics.

[38]  Susan W. Palocsay,et al.  Predicting criminal recidivism using neural networks , 2000 .

[39]  Brandon L. Garrett,et al.  Open Risk Assessment , 2020, Behavioral sciences & the law.

[40]  R. Berk Accuracy and Fairness for Juvenile Justice Risk Assessments , 2019, Journal of Empirical Legal Studies.

[41]  Julia Rubin,et al.  Fairness Definitions Explained , 2018, 2018 IEEE/ACM International Workshop on Software Fairness (FairWare).

[42]  Risk and Needs Assessment in the Federal Prison System [July 10, 2018] , 2018 .

[43]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[44]  J. Defronzo Climate and Crime , 1984 .

[45]  Sonja B. Starr The Risk Assessment Era: An Overdue Debate , 2015 .

[46]  P. Meehl,et al.  Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical–statistical controversy. , 1996 .

[47]  N. Tollenaar,et al.  Which method predicts recidivism best?: a comparison of statistical, machine learning and data mining predictive models , 2013 .

[48]  Brandon Smith Auditing Deep Neural Networks to Understand Recidivism Predictions , 2016 .

[49]  L. Sherman The power few: experimental criminology and the reduction of harm , 2007 .

[50]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[51]  C. Webster,et al.  Violence Risk Appraisal Guide , 2014 .

[52]  Anna Bindler,et al.  How Punishment Severity Affects Jury Verdicts: Evidence from Two Natural Experiments , 2018, American Economic Journal: Economic Policy.

[53]  Thomas S. Woodson Weapons of math destruction , 2018, Journal of Responsible Innovation.

[54]  Sharad Goel,et al.  The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning , 2018, ArXiv.

[55]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[56]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[57]  Cynthia Rudin,et al.  Optimized Risk Scores , 2017, KDD.

[58]  Christopher Slobogin,et al.  Algorithmic risk assessments and the double-edged sword of youth. , 2018, Behavioral sciences & the law.

[59]  Richard A. Berk,et al.  An impact assessment of machine learning risk forecasts on parole board decisions and recidivism , 2017, Journal of Experimental Criminology.

[60]  J. Friedman Stochastic gradient boosting , 2002 .

[61]  Christopher T. Lowenkamp,et al.  Special Issue: Evidence-Based Practices in Action *30IMPLEMENTING RISK ASSESSMENT IN THE FEDERAL PRETRIAL SERVICES SYSTEM , 2011 .

[62]  M. Stevenson,et al.  Assessing Risk Assessment in Action , 2018 .

[63]  Jongbin Jung,et al.  The limits of human predictions of recidivism , 2020, Science Advances.

[64]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[65]  Reuben Binns,et al.  Fairness in Machine Learning: Lessons from Political Philosophy , 2017, FAT.

[66]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[67]  Plamen Angelov,et al.  Fair-by-design explainable models for prediction of recidivism , 2019, ArXiv.

[68]  Christopher T. Lowenkamp,et al.  False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used across the Country to Predict Future Criminals. and It's Biased against Blacks" , 2016 .

[69]  Cynthia Rudin,et al.  The age of secrecy and unfairness in recidivism prediction , 2018, 2.1.

[70]  Matthew H. Ranson Crime, Weather, and Climate Change , 2012 .