Predictive value of statistical models

A review is given of different ways of estimating the error rate of a prediction rule based on a statistical model. A distinction is drawn between apparent, optimum and actual error rates. Moreover it is shown how cross-validation can be used to obtain an adjusted predictor with smaller error rate. A detailed discussion is given for ordinary least squares, logistic regression and Cox regression in survival analysis. Finally, the splitsample approach is discussed and demonstrated on two data sets.

[1]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[2]  R. Shibata An optimal selection of regression variables , 1981 .

[3]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[4]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[5]  W. R. Gilks Some Applications of Hierarchical Models in Kidney Transplantation , 1987 .

[6]  J. Crowley,et al.  A Diagnostic for Cox Regression and General Conditional Likelihoods , 1985 .

[7]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[8]  R. R. Hocking The analysis and selection of variables in linear regression , 1976 .

[9]  Philip J. McCarthy,et al.  The Use of Balanced Half-Sample Replication in Cross-Validation Studies , 1976 .

[10]  M. Thompson Selection of Variables in Multiple Regression: Part I. A Review and Evaluation , 1978 .

[11]  Gail Gong Cross-Validation, the Jackknife, and the Bootstrap: Excess Error Estimation in Forward Logistic Regression , 1986 .

[12]  D. M. Allen Mean Square Error of Prediction as a Criterion for Selecting Variables , 1971 .

[13]  C. Radhakrishna Rao,et al.  Prediction of Future Observations in Growth Curve Models , 1987 .

[14]  D. Pregibon Logistic Regression Diagnostics , 1981 .

[15]  M. C. Jones,et al.  On the Robustness of Shrinkage Predictors in Regression to Differences between Past and Future Data , 1986 .

[16]  Bernd Droge,et al.  Bootstrap and Cross-Validation Estimates of the Prediction Error for Linear Regression Models , 1984 .

[17]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[18]  S. Cessie,et al.  Logistic Regression, a review , 1988 .

[19]  J. C. van Houwelingen,et al.  A goodness-of-fit test for binary regression models, based on smoothing methods , 1991 .

[20]  J. Copas Regression, Prediction and Shrinkage , 1983 .

[21]  D. Freedman,et al.  How Many Variables Should Be Entered in a Regression Equation , 1983 .

[22]  M. Stone An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .

[23]  M. Thompson Selection of Variables in Multiple Regression: Part II. Chosen Procedures, Computations and Examples , 1978 .

[24]  A. Atkinson A note on the generalized information criterion for choice of a model , 1980 .

[25]  H C van Houwelingen,et al.  Comparison of the predictive power of different prognostic indices for overall survival in patients with advanced ovarian carcinoma. , 1990, Cancer research.

[26]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[27]  J. Neijt,et al.  Predictability of the survival of patients with advanced ovarian cancer. , 1989, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[28]  D. Hosmer,et al.  Goodness of fit tests for the multiple logistic regression model , 1980 .

[29]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .