Creating diagnostic scores using data-adaptive regression: An application to prediction of 30-day mortality among stroke victims in a rural hospital in India

Developing diagnostic scores for prediction of clinical outcomes uses medical knowledge regarding which variables are most important and empirical/statistical learning to find the functional form of these covariates that provides the most accurate prediction (eg, highest specificity and sensitivity). Given the variables chosen by the clinician as most relevant or available due to limited resources, the job is a purely statistical one: which model, among competitors, provides the most accurate prediction of clinical outcomes, where accuracy is relative to some loss function. An optimal algorithm for choosing a model follows: (1) provides a flexible, sequence of models, which can ‘twist and bend’ to fit the data and (2) use of a validation procedure that optimally balances bias/variance by choosing models of the right size (complexity). We propose a solution to creating diagnostic scores that, given the available variables, will appropriately trade-off model complexity with variability of estimation; the algorithm uses a combination of machine learning, logistic regression (POLYCLASS) and cross-validation. For example, we apply the procedure to data collected from stroke victims in a rural clinic in India, where the outcome of interest is death within 30 days. A quick and accurate diagnosis of stroke is important for immediate resuscitation. Equally important is giving patients and their families an indication of the prognosis. Accurate predictions of clinical outcomes made soon after the onset of stroke can also help choose appropriate supporting treatment decisions. Severity scores have been created in developed nations (for instance, Guy’s Prognostic Score, Canadian Neurological Score, and the National Institute of Health Stroke Scale). However, we propose a method for developing scores appropriate to local settings in possibly very different medical circumstances. Specifically, we used a freely available and easy to use exploratory regression technique (POLYCLASS) to predict 30-day mortality following stroke in a rural Indian population and compared the accuracy of the technique with these existing stroke scales, resulting in more accurate prediction than the existing scores (POLYCLASS sensitivity and specificity of 90% and 76%, respectively). This method can easily be extrapolated to different clinical settings and for different disease outcomes. In addition, the software and algorithms used are open-source (free) and we provide the code in the appendix.

[1]  AH Chowdhury,et al.  Assessment of coma and impaired consciousness : A critical review , 2009 .

[2]  A. Hubbard,et al.  The impact of age, temperature, and parasite density on treatment outcomes from antimalarial clinical trials in Kampala, Uganda. , 2004, The American journal of tropical medicine and hygiene.

[3]  Mark J. van der Laan,et al.  Loss-Based Cross-Validated Deletion/Substitution/Addition Algorithms in Estimation , 2004 .

[4]  R. Garg,et al.  Predictive value of routine hematological and biochemical parameters on 30-day fatality in acute stroke. , 2004, Neurology India.

[5]  M. Viitanen,et al.  Predictors of Death Among Long-Term Stroke Survivors , 2003, Stroke.

[6]  C. Levi,et al.  A prediction model of 1-year mortality for acute ischemic stroke patients. , 2003, Archives of physical medicine and rehabilitation.

[7]  P. Kelly,et al.  Mortality and Recovery After Stroke in The Gambia , 2003, Stroke.

[8]  S. Kalantri,et al.  Poor accuracy of the Siriraj and Guy's hospital stroke scores in distinguishing haemorrhagic from ischaemic stroke in a rural, tertiary care hospital. , 2003, The National medical journal of India.

[9]  K. Lees,et al.  The prognostic value of the components of the Glasgow Coma Scale following acute stroke. , 2003, QJM : monthly journal of the Association of Physicians.

[10]  S. Dudoit,et al.  Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples , 2003 .

[11]  T. Olsen,et al.  Admission Body Temperature Predicts Long-Term Mortality After Acute Stroke: The Copenhagen Stroke Study , 2002, Stroke.

[12]  C. Warlow,et al.  Predicting Outcome After Acute and Subacute Stroke: Development and Validation of New Prognostic Models , 2002, Stroke.

[13]  C. Levi,et al.  A prognostic index for 30-day mortality after stroke. , 2001, Journal of clinical epidemiology.

[14]  Brian Silver,et al.  A three-item scale for the early prediction of stroke recovery , 2001, The Lancet.

[15]  S. Hajat,et al.  Effects of poststroke pyrexia on stroke outcome : a meta-analysis of studies in patients. , 2000, Stroke.

[16]  A. Słowik,et al.  Early predictors of 30-day mortality in supratentorial ischemic stroke patients--first episode. , 2000, Medical science monitor : international medical journal of experimental and clinical research.

[17]  W R Clarke,et al.  Baseline NIH Stroke Scale score strongly predicts outcome after stroke , 1999, Neurology.

[18]  K. Brittain,et al.  Stroke and incontinence. , 1998, Stroke.

[19]  J. Marrugat,et al.  Timing for fever-related brain damage in acute ischemic stroke. , 1998, Stroke.

[20]  G. Murray,et al.  Is hyperglycaemia an independent predictor of poor outcome after acute stroke? Results of a long term follow up study , 1997, BMJ.

[21]  M. Alexander,et al.  Incontinence after stroke in a rehabilitation setting , 1996, Neurology.

[22]  T. Olsen,et al.  Body temperature in acute stroke: relation to stroke severity, infarct size, mortality, and outcome , 1996, The Lancet.

[23]  J. Pruvo,et al.  Early predictors of death and disability after acute cerebral ischemic event. , 1995, Stroke.

[24]  C. J. Stone,et al.  Polychotomous Regression , 1995 .

[25]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[26]  D. Barer Continence after stroke: useful predictor or goal of therapy? , 1989, Age and ageing.

[27]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[28]  B Jennett,et al.  Predicting the Outcome , 1987, Journal of the Royal Society of Medicine.

[29]  D. Wade,et al.  Outlook after an acute stroke: urinary incontinence and loss of consciousness compared in 532 patients. , 1985, The Quarterly journal of medicine.

[30]  C. Allen Predicting the outcome of acute stroke: a prognostic score. , 1984, Journal of neurology, neurosurgery, and psychiatry.

[31]  A. Melman,et al.  Predictive correlation of urodynamic dysfunction and brain injury after cerebrovascular accident. , 1981, The Journal of urology.

[32]  B. Jennett,et al.  Assessment of coma and impaired consciousness. A practical scale. , 1974, Lancet.