Using Regression Kernels to Forecast A Failure to Appear in Court

Forecasts of prospective criminal behavior have long been an important feature of many criminal justice decisions. There is now substantial evidence that machine learning procedures will classify and forecast at least as well, and typically better, than logistic regression, which has to date dominated conventional practice. However, machine learning procedures are adaptive. They \learn" inductively from training data. As a result, they typically perform best with very large datasets. There is a need, therefore, for forecasting procedures with the promise of machine learning that will perform well with small to moderately-sized datasets. Kernel methods provide precisely that promise. In this paper, we oer an overview of kernel methods in regression settings and compare such a method, regularized with principle components, to stepwise logistic regression. We apply both to a timely and important criminal justice concern: a failure to appear (FTA) at court proceedings following an arraignment. A forecast of an FTA can be an important factor is a judge’s decision to release a defendant while awaiting trial and can inuence the conditions imposed on that release. Forecasting accuracy matters, and our kernel approach forecasts far more accurately than stepwise logistic regression. The methods developed here are implemented in the R package kernReg currently available on CRAN.

[1]  Paul T. Seed,et al.  The use of cost information when defining critical values for prediction of rare events by using logistic regression and similar methods , 2010 .

[2]  Richard A. Berk,et al.  Statistical Procedures for Forecasting Criminal Behavior , 2013 .

[3]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[4]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[5]  Greg Ridgeway,et al.  Linking prediction and prevention , 2013 .

[6]  G. Wahba,et al.  Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy : the 1994 Neyman Memorial Lecture , 1995 .

[7]  R. Tibshirani A signicance test for the lasso , 2014 .

[8]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[9]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[10]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[11]  Alan J. Tomkins,et al.  Reducing courts' failure-to-appear rate by written reminders , 2013 .

[12]  Glenn Zorpette,et al.  The pitfalls of prediction [Spectral Lines] , 2014 .

[13]  B. Efron Better Bootstrap Confidence Intervals , 1987 .

[14]  H. White Using Least Squares to Approximate Unknown Regression Functions , 1980 .

[15]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[16]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[17]  A. E. Hoerl,et al.  Ridge regression:some simulations , 1975 .

[18]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[19]  Erika Cule,et al.  Ridge Regression in Prediction Problems: Automatic Choice of the Ridge Parameter , 2013, Genetic epidemiology.

[20]  Jerome E. McElroy Introduction to the Manhattan Bail Project , 2011 .

[21]  A. Gelfand,et al.  Prediction in Criminology. , 1986 .

[22]  B. M. Pötscher,et al.  MODEL SELECTION AND INFERENCE: FACTS AND FICTION , 2005, Econometric Theory.

[23]  Howard G. Borden Factors for Predicting Parole Success , 1928 .

[24]  Julian J. Faraway,et al.  Does data splitting improve prediction? , 2013, Stat. Comput..

[25]  M. Bhaskara Rao,et al.  Model Selection and Inference , 2000, Technometrics.

[26]  David Mease,et al.  Boosted Classification Trees and Class Probability/Quantile Estimation , 2007, J. Mach. Learn. Res..

[27]  R. Schaefer,et al.  A ridge logistic estimator , 1984 .

[28]  A. Buja,et al.  Valid post-selection inference , 2013, 1306.1059.

[29]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[30]  Richard A. Berk,et al.  Statistical Inference After Model Selection , 2010 .

[31]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[32]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[33]  S. D. Gottfredson,et al.  Statistical Risk Assessment: Old Problems and New Applications , 2006 .

[34]  H. Leeb,et al.  CAN ONE ESTIMATE THE UNCONDITIONAL DISTRIBUTION OF POST-MODEL-SELECTION ESTIMATORS? , 2003, Econometric Theory.

[35]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[36]  Andreas Buja,et al.  The Conspiracy of Random Predictors and Model Violations against Classical Inference in Regression , 2014 .

[37]  M. Iorio,et al.  A semi-automatic method to guide the choice of ridge parameter in ridge regression , 2012, 1205.0686.

[38]  Joshua B. Tenenbaum,et al.  Exploiting compositionality to explore a large space of model structures , 2012, UAI.

[39]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[40]  Kai Zhang,et al.  Models as Approximations I: Consequences Illustrated with Linear Regression , 2014, Statistical Science.

[41]  R. Berk Criminal Justice Forecasts of Risk: A Machine Learning Approach , 2012 .

[42]  S. R. Searle,et al.  Matrix Algebra Useful for Statistics , 1982 .

[43]  John S. Goldkamp,et al.  Restoring accountability in pretrial release: the Philadelphia pretrial release supervision experiments , 2006 .

[44]  Andreas Buja,et al.  Misspecified Mean Function Regression , 2014 .

[45]  John C. Nankervis,et al.  Computational algorithms for double bootstrap confidence intervals , 2005, Comput. Stat. Data Anal..

[46]  Shawn D. Bushway,et al.  Is There Any Logic to Using Logit Finding the Right Tool for the Increasingly Important Job of Risk Prediction , 2013 .

[47]  R. Dawes,et al.  Heuristics and Biases: Clinical versus Actuarial Judgment , 2002 .

[48]  Shyam S. Chandramouli,et al.  Large-Scale Sparse Kernel Logistic Regression — with a comparative study on optimization algorithms , 2011 .

[49]  Richard A. Berk,et al.  Overview of: “Statistical Procedures for Forecasting Criminal Behavior: A Comparative Assessment” , 2013 .

[50]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[51]  A. Reiss, The Accuracy, Efficiency, and Validity of a Prediction Instrument , 1951, American Journal of Sociology.

[52]  Ali Shojaie,et al.  Inference in High Dimensions with the Penalized Score Test , 2014, 1401.2678.

[53]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[54]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[55]  J. Friedman Stochastic gradient boosting , 2002 .