Development of a clinical prediction model for an ordinal outcome: the World Health Organization Multicentre Study of Clinical Signs and Etiological agents of Pneumonia, Sepsis and Meningitis in Young Infants. WHO/ARI Young Infant Multicentre Study Group.

This paper describes the methodologies used to develop a prediction model to assist health workers in developing countries in facing one of the most difficult health problems in all parts of the world: the presentation of an acutely ill young infant. Statistical approaches for developing the clinical prediction model faced at least two major difficulties. First, the number of predictor variables, especially clinical signs and symptoms, is very large, necessitating the use of data reduction techniques that are blinded to the outcome. Second, there is no uniquely accepted continuous outcome measure or final binary diagnostic criterion. For example, the diagnosis of neonatal sepsis is ill-defined. Clinical decision makers must identify infants likely to have positive cultures as well as to grade the severity of illness. In the WHO/ARI Young Infant Multicentre Study we have found an ordinal outcome scale made up of a mixture of laboratory and diagnostic markers to have several clinical advantages as well as to increase the power of tests for risk factors. Such a mixed ordinal scale does present statistical challenges because it may violate constant slope assumptions of ordinal regression models. In this paper we develop and validate an ordinal predictive model after choosing a data reduction technique. We show how ordinality of the outcome is checked against each predictor. We describe new but simple techniques for graphically examining residuals from ordinal logistic models to detect problems with variable transformations as well as to detect non-proportional odds and other lack of fit. We examine an alternative type of ordinal logistic model, the continuation ratio model, to determine if it provides a better fit. We find that it does not but that this model is easily modified to allow the regression coefficients to vary with cut-offs of the response variable. Complex terms in this extended model are penalized to allow only as much complexity as the data will support. We approximate the extended continuation ratio model with a model with fewer terms to allow us to draw a nomogram for obtaining various predictions. The model is validated for calibration and discrimination using the bootstrap. We apply much of the modelling strategy described in Harrell, Lee and Mark (Statist. Med. 15, 361-387 (1998)) for survival analysis, adapting it to ordinal logistic regression and further emphasizing penalized maximum likelihood estimation and data reduction.

[1]  W. Hoeffding A Non-Parametric Test of Independence , 1948 .

[2]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[3]  Strother H. Walker,et al.  Estimation of the probability of an event as a function of several independent variables. , 1967, Biometrika.

[4]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[7]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[8]  A. Atkinson A note on the generalized information criterion for choice of a model , 1980 .

[9]  J. Anderson,et al.  Regression, Discrimination and Measurement Models for Ordered Categorical Variables , 1981 .

[10]  David A. Schoenfeld,et al.  Partial residuals for the proportional hazards regression model , 1982 .

[11]  J. E. Jackson,et al.  Factor analysis, an applied approach , 1983 .

[12]  J. Anderson Regression and Ordered Categorical Variables , 1984 .

[13]  D. Pregibon,et al.  Graphical Methods for Assessing Logistic Regression Models , 1984 .

[14]  F. Harrell,et al.  Regression modelling strategies for improved prognostic prediction. , 1984, Statistics in medicine.

[15]  F. Harrell,et al.  Regression models for prognostic prediction: advantages, problems, and suggested solutions. , 1985, Cancer treatment reports.

[16]  G. Koch,et al.  Two stage procedure for the analysis of ordinal categorical data , 1985 .

[17]  D J Spiegelhalter,et al.  Probabilistic prediction in patient management and clinical trials. , 1986, Statistics in medicine.

[18]  F. Harrell,et al.  Partial Proportional Odds Models for Ordinal Response Variables , 1990 .

[19]  F. Harrell,et al.  Regression models in clinical studies: determining relationships between predictors and response. , 1988, Journal of the National Cancer Institute.

[20]  T. Hastie,et al.  Regression with an ordered categorical response. , 1989, Statistics in medicine.

[21]  D. Ashby,et al.  The ordered logistic regression model in psychiatry: rising prevalence of dementia in old people's homes. , 1989, Statistics in medicine.

[22]  B. Armstrong,et al.  Ordinal regression models for epidemiologic data. , 1989, American journal of epidemiology.

[23]  D. Altman,et al.  Bootstrap investigation of the stability of a Cox regression model. , 1989, Statistics in medicine.

[24]  A Agresti,et al.  A survey of models for repeated ordered categorical response data. , 1989, Statistics in medicine.

[25]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[26]  Frank E. Harrell,et al.  The restricted cubic spline hazard model , 1990 .

[27]  S J Pocock,et al.  Prognostic scores for detecting a high risk group: estimating the sensitivity when applied to new data. , 1990, Statistics in medicine.

[28]  J. C. van Houwelingen,et al.  Predictive value of statistical models , 1990 .

[29]  N. Nagelkerke,et al.  A note on a general definition of the coefficient of determination , 1991 .

[30]  F. Harrell,et al.  Using ordinal logistic regression to estimate the likelihood of colorectal neoplasia. , 1991, Journal of clinical epidemiology.

[31]  Colin J. Morley,et al.  A scoring system to quantify illness in babies under 6 months of age , 1991 .

[32]  J Whitehead,et al.  Analysis of failure time data with ordinal categories of response. , 1991, Statistics in medicine.

[33]  J. Faraway On the Cost of Data Analysis , 1992 .

[34]  Robert Gray,et al.  Flexible Methods for Analyzing Survival Data Using Splines, with Applications to Breast Cancer Prognosis , 1992 .

[35]  Jim Freeman A User's Guide to Principal Components , 1992 .

[36]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[37]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[38]  J. Whitehead Sample size calculations for ordered categorical data. , 1993, Statistics in medicine.

[39]  W G Henderson,et al.  Assessment of predictive models for binary outcomes: an empirical approach using operative death from cardiac surgery. , 1994, Statistics in medicine.

[40]  S Greenland,et al.  Alternative models for ordinal logistic regression. , 1994, Statistics in medicine.

[41]  P. J. Verweij,et al.  Penalized likelihood in Cox regression. , 1994, Statistics in medicine.

[42]  X H Zhou,et al.  Effect of verification bias on positive and negative predictive values. , 1994, Statistics in medicine.

[43]  P. Grambsch,et al.  Proportional hazards tests and diagnostics based on weighted residuals , 1994 .

[44]  D. Collett,et al.  Modelling Binary Data. , 1994 .

[45]  Leslie Lamport,et al.  LaTeX - A Document Preparation System: User's Guide and Reference Manual, Second Edition , 1994 .

[46]  R. D'Agostino,et al.  Development of health risk appraisal functions in the presence of multiple indicators: the Framingham Study nursing home institutionalization model. , 1995, Statistics in medicine.

[47]  C. Cox,et al.  Location-scale cumulative odds models for ordinal data: a generalized non-linear model approach. , 1995, Statistics in medicine.

[48]  D. Follmann Multivariate tests for multiple endpoints in clinical trials. , 1995, Statistics in Medicine.

[49]  C. Wild,et al.  Vector Generalized Additive Models , 1996 .

[50]  William N. Venables,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[51]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[52]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[53]  Christopher M. Bishop,et al.  Classification and regression , 1997 .