The limits of human predictions of recidivism

Statistical algorithms can outperform human predictions of recidivism. Dressel and Farid recently found that laypeople were as accurate as statistical algorithms in predicting whether a defendant would reoffend, casting doubt on the value of risk assessment tools in the criminal justice system. We report the results of a replication and extension of Dressel and Farid’s experiment. Under conditions similar to the original study, we found nearly identical results, with humans and algorithms performing comparably. However, algorithms beat humans in the three other datasets we examined. The performance gap between humans and algorithms was particularly pronounced when, in a departure from the original study, participants were not provided with immediate feedback on the accuracy of their responses. Algorithms also outperformed humans when the information provided for predictions included an enriched (versus restricted) set of risk factors. These results suggest that algorithms can outperform human predictions of recidivism in ecologically valid settings.

[1]  D. A. Andrews,et al.  The Recent Past and Near Future of Risk and/or Need Assessment , 2006 .

[2]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[3]  Ben Green,et al.  Disparate Interactions: An Algorithm-in-the-Loop Analysis of Fairness in Risk Assessments , 2019, FAT.

[4]  M. Stevenson,et al.  Assessing Risk Assessment in Action , 2018 .

[5]  R. Dawes,et al.  Heuristics and Biases: Clinical versus Actuarial Judgment , 2002 .

[6]  Daryl G. Kroner,et al.  The Impact of Base Rate Utilization and Clinical Experience on the Accuracy of Judgments Made with the HCR-20 , 2014 .

[7]  Vernon L. Quinsey,et al.  The Likelihood of Violent Behaviour: Predictions, Postdictions, and Hindsight Bias , 1995 .

[8]  Alexandra Chouldechova,et al.  Does mitigating ML's impact disparity require treatment disparity? , 2017, NeurIPS.

[9]  Christopher T. Lowenkamp,et al.  False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used across the Country to Predict Future Criminals. and It's Biased against Blacks" , 2016 .

[10]  Peter A. Flach,et al.  A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss C` Esar Ferri , 2012 .

[11]  Ravi Shroff,et al.  The accuracy, equity, and jurisprudence of criminal risk assessment , 2021, Research Handbook on Big Data Law.

[12]  Hany Farid,et al.  The accuracy, fairness, and limits of predicting recidivism , 2018, Science Advances.

[13]  D. A. Andrews,et al.  The level of service inventory – revised , 1995 .

[14]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[15]  Christopher T. Lowenkamp,et al.  EVALUATION OF OHIO'S COMMUNITY BASED CORRECTIONAL FACILITIES AND HALFWAY HOUSE PROGRAMS FINAL REPORT , 2002 .

[16]  Chris Guthrie,et al.  Blinking on the Bench: How Judges Decide Cases , 2007 .

[17]  Sharad Goel,et al.  The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning , 2018, ArXiv.

[18]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[19]  J. Koehler The base rate fallacy reconsidered: Descriptive, normative, and methodological challenges , 1996, Behavioral and Brain Sciences.

[20]  F. Galton Vox Populi , 1907, Nature.

[21]  Kaspar Rufibach,et al.  Use of Brier score to assess binary predictions. , 2010, Journal of clinical epidemiology.

[22]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[23]  A. Tversky,et al.  On the psychology of prediction , 1973 .

[24]  Genna R. Cohen,et al.  The Meta-Analysis of Clinical Judgment Project: Fifty-Six Years of Accumulated Research on Clinical Versus Statistical Prediction , 2006 .

[25]  Christopher T. Lowenkamp,et al.  Validating the Level of Service Inventory—Revised on a Sample of Federal Probationers , 2017 .

[26]  G. Harris,et al.  The accuracy of recidivism risk assessments for sexual offenders: a meta-analysis of 118 prediction studies. , 2009, Psychological assessment.

[27]  Jennifer L. Skeem,et al.  Risk Assessment in Criminal Sentencing. , 2016, Annual review of clinical psychology.

[28]  Emre Soyer,et al.  Sequentially simulated outcomes: kind experience versus nontransparent description. , 2011, Journal of experimental psychology. General.

[29]  Robin M. Hogarth,et al.  The Two Settings of Kind and Wicked Learning Environments , 2015 .

[30]  W. Grove,et al.  Clinical versus mechanical prediction: a meta-analysis. , 2000, Psychological assessment.

[31]  Chiraag Sumanth,et al.  Studying the "Wisdom of Crowds" at Scale , 2019, HCOMP.