In Pursuit of Interpretable, Fair and Accurate Machine Learning for Criminal Recidivism Prediction

In recent years, academics and investigative journalists have criticized certain commercial risk assessments for their black-box nature and failure to satisfy competing notions of fairness. Since then, the field of interpretable machine learning has created simple yet effective algorithms, while the field of fair machine learning has proposed various mathematical definitions of fairness. However, studies from these fields are largely independent, despite the fact that many applications of machine learning to social issues require both fairness and interpretability. We explore the intersection by revisiting the recidivism prediction problem using state-of-the-art tools from interpretable machine learning, and assessing the models for performance, interpretability, and fairness. Unlike previous works, we compare against two existing risk assessments (COMPAS and the Arnold Public Safety Assessment) and train models that output probabilities rather than binary predictions. We present multiple models that beat these risk assessments in performance, and provide a fairness analysis of these models. Our results imply that machine learning models should be trained separately for separate locations, and updated over time.

[1]  H. Hart Predicting Parole Success , 1923 .

[2]  V. Vapnik,et al.  A note one class of perceptrons , 1964 .

[3]  M. Wolfgang,et al.  Delinquency in a birth cohort , 1972 .

[4]  J. Gillette Problems in risk assessment , 1982 .

[5]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[6]  J. Defronzo Climate and Crime , 1984 .

[7]  R. Dawes,et al.  Heuristics and Biases: Clinical versus Actuarial Judgment , 2002 .

[8]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[9]  P. Meehl,et al.  Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical–statistical controversy. , 1996 .

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[11]  Susan W. Palocsay,et al.  Predicting criminal recidivism using neural networks , 2000 .

[12]  J. Friedman Stochastic gradient boosting , 2002 .

[13]  L. Motiuk The Statistical Information on Recidivism - Revised 1 (SIR-R1) Scale: A Psychometric Examination , 2002 .

[14]  R. Berk,et al.  Developing a Practical Forecasting Screener for Domestic Violence Incidents , 2004, Evaluation review.

[15]  A. Blumstein The Crime Drop in America: an Exploration of Some Recent Crime Trends , 2006 .

[16]  L. Sherman The power few: experimental criminology and the reduction of harm , 2007 .

[17]  S. Bushway,et al.  The Inextricable Link Between Age and Criminal History in Sentencing , 2007 .

[18]  Fred L. Cheesman,et al.  Using Risk Assessment to Inform Sentencing Decisions for Nonviolent Offenders in Virginia , 2007 .

[19]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[20]  P. Howard,et al.  OGRS 3:the revised Offender Group Reconviction Scale , 2009 .

[21]  T. Brennan,et al.  Evaluating the Predictive Validity of the Compas Risk and Needs Assessment System , 2009 .

[22]  Kristen M. Zgoba,et al.  Predicting Recidivism in Homicide Offenders Using Classification Tree Analysis , 2011 .

[23]  Christopher T. Lowenkamp,et al.  Special Issue: Evidence-Based Practices in Action *30IMPLEMENTING RISK ASSESSMENT IN THE FEDERAL PRETRIAL SERVICES SYSTEM , 2011 .

[24]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[25]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[26]  Johannes Gehrke,et al.  Accurate intelligible models with pairwise interactions , 2013, KDD.

[27]  N. Tollenaar,et al.  Which method predicts recidivism best?: a comparison of statistical, machine learning and data mining predictive models , 2013 .

[28]  C. Webster,et al.  Violence Risk Appraisal Guide , 2014 .

[29]  Matthew H. Ranson Crime, Weather, and Climate Change , 2012 .

[30]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[31]  A. Mishra Climate and Crime , 2015 .

[32]  Cynthia Rudin,et al.  Interpretable classification models for recidivism prediction , 2015, 1503.07810.

[33]  Justin M. Rao,et al.  Precinct or Prejudice? Understanding Racial Disparities in New York City's Stop-and-Frisk Policy , 2015 .

[34]  Sonja B. Starr The Risk Assessment Era: An Overdue Debate , 2015 .

[35]  Katherine Freeman Algorithmic Injustice: How the Wisconsin Supreme Court Failed to Protect Due Process Rights in State v. Loomis , 2016 .

[36]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[37]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[38]  Christopher T. Lowenkamp,et al.  False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used across the Country to Predict Future Criminals. and It's Biased against Blacks" , 2016 .

[39]  Brandon Smith Auditing Deep Neural Networks to Understand Recidivism Predictions , 2016 .

[40]  COMPAS Risk Scales : Demonstrating Accuracy Equity and Predictive Parity Performance of the COMPAS Risk Scales in Broward County , 2016 .

[41]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[42]  S. Bushway,et al.  Identifying Classes of Explanations for Crime Drop: Period and Cohort Effects for New York State , 2016 .

[43]  D. Kehl,et al.  Algorithms in the Criminal Justice System: Assessing the Use of Risk Assessments in Sentencing , 2017 .

[44]  Brian Onieal Model Penal Code , 2017 .

[45]  Cynthia Rudin,et al.  Optimized Risk Scores , 2017, KDD.

[46]  Richard A. Berk,et al.  An impact assessment of machine learning risk forecasts on parole board decisions and recidivism , 2017, Journal of Experimental Criminology.

[47]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[48]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[49]  Cynthia Rudin,et al.  Learning Cost-Effective and Interpretable Treatment Regimes , 2017, AISTATS.

[50]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[51]  Seth Neel,et al.  A Convex Framework for Fair Regression , 2017, ArXiv.

[52]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[53]  Jon M. Kleinberg,et al.  On Fairness and Calibration , 2017, NIPS.

[54]  Christopher Slobogin,et al.  Algorithmic risk assessments and the double-edged sword of youth. , 2018, Behavioral sciences & the law.

[55]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[56]  Risk and Needs Assessment in the Federal Prison System [July 10, 2018] , 2018 .

[57]  Reuben Binns,et al.  Fairness in Machine Learning: Lessons from Political Philosophy , 2017, FAT.

[58]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in Algorithmic Fairness , 2018, PERV.

[59]  M. Kearns,et al.  Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[60]  Julia Rubin,et al.  Fairness Definitions Explained , 2018, 2018 IEEE/ACM International Workshop on Software Fairness (FairWare).

[61]  Sharad Goel,et al.  The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning , 2018, ArXiv.

[62]  Ben Matthews,et al.  Rethinking one of criminology’s ‘brute facts’: The age–crime curve and the crime drop in Scotland , 2017, European journal of criminology.

[63]  Anna Bindler,et al.  How Punishment Severity Affects Jury Verdicts: Evidence from Two Natural Experiments , 2018, American Economic Journal: Economic Policy.

[64]  M. Stevenson,et al.  Assessing Risk Assessment in Action , 2018 .

[65]  Thomas S. Woodson Weapons of math destruction , 2018, Journal of Responsible Innovation.

[66]  Miroslav Dudík,et al.  Fair Regression: Quantitative Definitions and Reduction-based Algorithms , 2019, ICML.

[67]  Plamen Angelov,et al.  Fair-by-design explainable models for prediction of recidivism , 2019, ArXiv.

[68]  R. Berk Accuracy and Fairness for Juvenile Justice Risk Assessments , 2019, Journal of Empirical Legal Studies.

[69]  Cynthia Rudin,et al.  Learning Optimized Risk Scores , 2016, J. Mach. Learn. Res..

[70]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[71]  Jongbin Jung,et al.  The limits of human predictions of recidivism , 2020, Science Advances.

[72]  Cynthia Rudin,et al.  The age of secrecy and unfairness in recidivism prediction , 2018, 2.1.

[73]  Brandon L. Garrett,et al.  Open Risk Assessment , 2020, Behavioral sciences & the law.

[74]  Richard A. Berk,et al.  Nested conformal prediction sets for classification with applications to probation data , 2021, The Annals of Applied Statistics.

[75]  Subrajeet Mohapatra,et al.  Development of Risk Assessment Framework for First Time Offenders Using Ensemble Learning , 2021, IEEE Access.

[76]  Fragile Algorithms and Fallible Decision-Makers: Lessons from the Justice System , 2021, Journal of Economic Perspectives.