论文信息 - Prediction models need appropriate internal, internal-external, and external validation.

Prediction models need appropriate internal, internal-external, and external validation.

Recent Editorials in this journal stressed the classicalparadigminclinicalepidemiologyofinsistingontesteretestevaluations for studies on diagnosis and prognosis [1] andspeciﬁcally prediction models [2]. Indeed, independentvalidationofpreviousresearchﬁndingsisanimportantscien-tiﬁc principle.Another recent debate was on the interpretation of thelack of external validation studies of published novel pre-diction models [3e5]. One issue is the role that validationshould have at the time of model development. Many re-searchers may be tempted to try to report some proof forexternal validity, that is, on discrimination and calibration,in independent samples with their publication that proposesa new prediction model. Major clinical journals currentlyseem to appreciate such reporting. Another issue is whetherexternal validation should be performed by different au-thors than those involved in the development of the predic-tion model [3,6]. We would like to comment on these andrelated key issues in the scientiﬁc basis of predictionmodeling.The recent review conﬁrms that model developmentstudies are often relatively small for the complex chal-lenges posed by specifying the form of a prediction model(which predictors to include) and the estimation of predic-tor effects (overﬁt with standard estimation methods) [3].The median sample size was 445 subjects. The number ofevents is the limiting factor in this type of research andmay be far too low for reliable modeling [4]. In such smallsamples, internal validation is essential, and apparent per-formance estimates are severely optimistic (Fig. 1). Boot-strapping is the preferred approach for internal validationof prediction models [7e9]. A bootstrap procedure shouldinclude all modeling steps for an honest assessment ofmodel performance [10]. Speciﬁcally, any model selectionsteps, such as variable selection, need to be repeated perbootstrap sample if used.We recently conﬁrmed that a split sample approach with50% held out leads to models with a suboptimal perfor-mance, that is, models with unstable and on average thesame performance as obtained with half the sample size[11]. We hence strongly advise against random split sampleapproaches in small development samples. Split sample ap-proaches can be used in very large samples, but again, weadvise against this practice because overﬁtting is no issueif sample size is so large that a split sample procedurecan be performed. Split sample approaches only work whennot needed.More relevant are attempts to obtain impressions ofexternal validity: do model predictions hold true indifferent settings, for example, in subjects from other cen-ters, or subjects seen more recently? Here, a nonrandomsplit can often be made in the development sample, forexample, by year of diagnosis. For example, we might vali-date a model on the most recent one-third of the sampleheld out from model development. Because the split is intime, this would qualify as a temporal external validation[6]. The disadvantages of a random split sample approachunfortunately equally hold here: a poorer model is devel-oped (on smaller sample size than the full developmentsample), and the validation ﬁndings are unstable (basedon a small sample size) [9].We make two propositions for validation at the time ofprediction model development (Fig. 2). First, we recom-mend an ‘‘internaleexternal’’ validation procedure. In thecontext of individual patient data meta-analysis (IPD-MA), internaleexternal cross-validation has been used toshow external validity of a prediction model [12,13].Inan MA context, the natural unit for splitting is by study.Every study is left out once, for validation of a model basedon the remaining studies. The ﬁnal model is based on thepooled data set, which we label an ‘‘internallyeexternally

Frank E. Harrell | Ewout W. Steyerberg | F. Harrell | E. Steyerberg