Scoring Model Predictions using Cross-Validation

7 We formalize a framework for quantitatively assessing agreement between two 8 datasets that are assumed to come from two distinct data generating mechanisms. 9 We propose a methodology for prediction scoring which provides a measure of the 10 distance between two unobserved data generating mechanisms (DGMs), along the 11 dimension of a particular model. The cross-validated scores can be used to evalu12 ate preregistered hypotheses and to perform model validation in the face of complex 13 statistical models. Using human behavior data from the Next Generation Social Sci14 ence (NGS2) program, we demonstrate that prediction scores can be used as model 15 assessment tools and that they can reveal insights based on data collected from dif16 ferent populations and across different settings. Our proposed cross-validated pre17 diction scores are capable of quantifying true differences between data generating 18 mechanisms, allow for the validation and assessment of complex models, and serve as 19 valuable tools for reproducible research. 20

[1]  Thomas M. Hamill,et al.  Verification of Eta–RSM Short-Range Ensemble Forecasts , 1997 .

[2]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[3]  Aki Vehtari,et al.  A survey of Bayesian predictive methods for model assessment, selection and comparison , 2012 .

[4]  Leonhard Held,et al.  Posterior and Cross-validatory Predictive Checks: A Comparison of MCMC and INLA , 2010 .

[5]  Tilmann Gneiting,et al.  Probabilistic forecasts, calibration and sharpness Series B Statistical methodology , 2007 .

[6]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[7]  Jeffrey L. Anderson A Method for Producing and Evaluating Probabilistic Forecasts from Ensemble Model Integrations , 1996 .

[8]  David G. Rand,et al.  Dynamic social networks promote cooperation in experiments with humans , 2011, Proceedings of the National Academy of Sciences.

[9]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[10]  Aki Vehtari,et al.  Understanding predictive information criteria for Bayesian models , 2013, Statistics and Computing.

[11]  M. Rosenblatt Remarks on a Multivariate Transformation , 1952 .

[12]  Andrew Gelman,et al.  Diculty of selecting among multilevel models using predictive accuracy , 2015 .

[13]  Donald B. Rubin,et al.  Validation of Software for Bayesian Models Using Posterior Quantiles , 2006 .

[14]  A. Gelman,et al.  The statistical crisis in science , 2014 .

[15]  K. Pearson ON A METHOD OF DETERMINING WHETHER A SAMPLE OF SIZE n SUPPOSED TO HAVE BEEN DRAWN FROM A PARENT POPULATION HAVING A KNOWN PROBABILITY INTEGRAL HAS PROBABLY BEEN DRAWN AT RANDOM , 1933 .

[16]  N. Shephard Partial non-Gaussian state space , 1994 .

[17]  T. Gneiting Making and Evaluating Point Forecasts , 2009, 0912.0902.

[18]  Shi Qiu,et al.  Approximating cross-validatory predictive evaluation in Bayesian latent variable models with integrated IS and WAIC , 2014, Stat. Comput..

[19]  Claudia Czado,et al.  Predictive Model Assessment for Count Data , 2009, Biometrics.

[20]  Macartan Humphreys,et al.  Fishing, Commitment, and Communication: A Proposal for Comprehensive Nonbinding Research Registration , 2012, Political Analysis.

[21]  Aki Vehtari,et al.  Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC , 2015, Statistics and Computing.

[22]  Russell B. Millar,et al.  Conditional vs marginal estimation of the predictive loss of hierarchical models using WAIC and cross-validation , 2018, Stat. Comput..

[23]  Yolanda Gil,et al.  Enhancing reproducibility for computational methods , 2016, Science.

[24]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[25]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[26]  Andrew Gelman,et al.  Preregistration of Studies and Mock Reports , 2013, Political Analysis.

[27]  Brian A. Nosek,et al.  NGS2 DARPA Program , 2016 .