Validation in prediction research: the waste by data splitting.

Accurate prediction of medical outcomes is important for diagnosis and prognosis. The standard requirement in major medical journals is nowadays that validity outside the development sample needs to be shown. Is such data splitting an example of a waste of resources? In large samples, interest should shift to assessment of heterogeneity in model performance across settings. In small samples, cross-validation and bootstrapping are more efficient approaches. In conclusion, random data splitting should be abolished for validation of prediction models.

[1]  John P A Ioannidis,et al.  Improving Validation Practices in “Omics” Research , 2011, Science.

[2]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[3]  Frank E. Harrell,et al.  Prediction models need appropriate internal, internal-external, and external validation. , 2016, Journal of clinical epidemiology.

[4]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[5]  Gary S Collins,et al.  Sample size considerations for the external validation of a multivariable prognostic model: a resampling study , 2015, Statistics in medicine.

[6]  Richard D Riley,et al.  External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges , 2016, BMJ.

[7]  Yvonne Vergouwe,et al.  Geographic and temporal validity of prediction models: different approaches were useful to examine model performance. , 2016, Journal of clinical epidemiology.

[8]  Yvonne Vergouwe,et al.  Validation of prediction models: examining temporal and geographic stability of baseline risk and estimated covariate effects , 2017, Diagnostic and Prognostic Research.

[9]  V. Gil-Guillén,et al.  Sample size calculation to externally validate scoring systems based on logistic regression models , 2017, PloS one.

[10]  G. Collins,et al.  External validation of multivariable prediction models: a systematic review of methodological conduct and reporting , 2014, BMC Medical Research Methodology.

[11]  Yvonne Vergouwe,et al.  Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. , 2005, Journal of clinical epidemiology.

[12]  Patrick M M Bossuyt,et al.  Waste, Leaks, and Failures in the Biomarker Pipeline. , 2017, Clinical chemistry.

[13]  J. Ioannidis,et al.  External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. , 2015, Journal of clinical epidemiology.

[14]  Sean C. Bendall,et al.  Single-cell developmental classification of B cell precursor acute lymphoblastic leukemia at diagnosis reveals predictors of relapse , 2018, Nature Medicine.

[15]  J. Hippisley-Cox,et al.  Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study , 2017, British Medical Journal.

[16]  L. V. van't Veer,et al.  70-Gene Signature as an Aid to Treatment Decisions in Early-Stage Breast Cancer. , 2016, The New England journal of medicine.

[17]  J. Habbema,et al.  Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. , 2001, Journal of clinical epidemiology.

[18]  J. Knottnerus,et al.  Assessment of the accuracy of diagnostic tests: the cross-sectional study. , 2003, Journal of clinical epidemiology.

[19]  Ewout W Steyerberg,et al.  Poor performance of clinical prediction models: the harm of commonly applied methods. , 2017, Journal of clinical epidemiology.

[20]  Gary S Collins,et al.  Consequences of relying on statistical significance: Some illustrations , 2018, European journal of clinical investigation.

[21]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[22]  E. Steyerberg Clinical Prediction Models , 2008, Statistics for Biology and Health.