Forecasting Dangerous Inmate Misconduct: An Application of Ensemble Statistical Procedures

In this paper, we attempt to forecast which prison inmates are likely to engage in very serious misconduct while incarcerated. Such misconduct would usually be a major felony if committed outside of prison: drug trafficking, assault, rape, attempted murder and other crimes. The binary response variable is problematic because it is highly unbalanced. Using data from nearly 10,000 inmates held in facilities operated by the California Department of Corrections, we show that several popular classification procedures do no better than the marginal distribution unless the data are weighted in a fashion that compensates for the lack of balance. Then, random forests performs reasonably well, and better than CART or logistic regression. Although less than 3% of the inmates studied over 24 months were reported for very serious misconduct, we are able to correctly forecast such behavior about half the time.

[1]  Chester Hartman,et al.  Rejoinder by the Author , 1965 .

[2]  Thomas F. Cooley,et al.  A dynamic decision theoretic perspective on modeling the performance of the criminal justice system , 1979 .

[3]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[4]  Michael R. Gottfredson,et al.  A general theory of crime. , 1992 .

[5]  James Austin,et al.  Evaluating How Well Your Classification System is Operating: A Practical Approach , 1986 .

[6]  Thomas R. Kane,et al.  The Validity of Prison Classification: An Introduction to Practical Considerations and Research Issues , 1986 .

[7]  Tim Brennan,et al.  Classification: An Overview of Selected Methodological Issues , 1987, Crime and Justice.

[8]  Robert J. Sampson,et al.  Crime and Deviance Over the Life Course: The Salience of Adult Social Bonds , 1990 .

[9]  I. Jolliffe,et al.  Nonlinear Multivariate Analysis , 1992 .

[10]  A. Gifi,et al.  NONLINEAR MULTIVARIATE ANALYSIS , 1990 .

[11]  Jan de Leeuw,et al.  An Evaluation of California's Inmate Classification System Using a Generalized Regression Discontinuity Design , 1999 .

[12]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[13]  Neal P. Langan,et al.  Gender Differences in Predictors of Prison Violence: Assessing the Predictive Validity of a Risk Classification System , 2001 .

[14]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[15]  Richard A. Berk,et al.  A Randomized Experiment Testing inmate Classification Systems , 2002 .

[16]  J. Friedman Stochastic gradient boosting , 2002 .

[17]  Shie Mannor,et al.  The Consistency of Greedy Algorithms for Classification , 2002, COLT.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Richard A. Berk,et al.  An Introduction to Ensemble Methods for Data Analysis , 2004 .

[20]  Kenneth C Land,et al.  Diverse trajectories of cocaine use through early adulthood among rebellious and socially conforming youth. , 2004, Social science research.