Predicting Severe COPD Exacerbations: Developing a Population Surveillance Approach with Administrative Data.

RATIONALE Automatic prediction algorithms based on routinely collected health data may be able to identify patients at high-risk for hospitalizations related to acute exacerbations of Chronic Obstructive Pulmonary Disease (COPD). OBJECTIVE This was a proof-of-concept study for a population surveillance approach towards identifying individuals at high risk of severe COPD exacerbations. METHODS We used British Columbia's administrative health databases (1997-2016) to identify patients with diagnosed COPD. We used data from the previous six months to predict the risk of severe exacerbation in the next two months after a randomly selected index date. We applied statistical and machine learning algorithms for risk prediction (logistic regression, random forest, neural network, and gradient boosting). We used calibration plots and receiver operating characteristic (ROC) curves to evaluate model performance based on a randomly chosen future date at least one year later (temporal validation). RESULTS There were 108,433 patients in the development and 113,786 in the validation datasets; of these, 1,126 and 1,136, respectively, were hospitalized due to COPD within their outcome windows. The best prediction algorithm (gradient boosting) had an area under the ROC curve of 0.82 (95%CI 0.80-0.83), significantly higher than the corresponding value for the model with exacerbation history as the only predictor (current standard of care - 0.68). The predicted risk scores were well calibrated in the validation dataset. CONCLUSIONS Imminent COPD-related hospitalizations can be predicted with good accuracy using administrative health data. This model may be used as a means to target high-risk patients for preventive exacerbation therapies.

[1]  Richard D Riley,et al.  Calculating the sample size required for developing a clinical prediction model , 2020, BMJ.

[2]  T. To,et al.  Socioeconomic status (SES) and 30-day hospital readmissions for chronic obstructive pulmonary (COPD) disease: A population-based cohort study , 2019, PloS one.

[3]  S. Annavarapu,et al.  Development and validation of a predictive model to identify patients at risk of severe COPD exacerbations using administrative claims data , 2018, International journal of chronic obstructive pulmonary disease.

[4]  Danilo Bzdok,et al.  Points of Significance: Statistics versus machine learning , 2018, Nature Methods.

[5]  R. Collier WHO guidelines on ethical public health surveillance , 2017, Canadian Medical Association Journal.

[6]  E. Jassem,et al.  Impact of Integrated Care Model (ICM) on Direct Medical Costs in Management of Advanced Chronic Obstructive Pulmonary Disease (COPD) , 2017, Medical science monitor : international medical journal of experimental and clinical research.

[7]  Camilla Bianchi,et al.  Prediction models for exacerbations in patients with COPD , 2017, European Respiratory Review.

[8]  Mohsen Sadatsafavi,et al.  The Projected Epidemic of Chronic Obstructive Pulmonary Disease Hospitalizations over the Next 15 Years. A Population-based Perspective. , 2016, American journal of respiratory and critical care medicine.

[9]  Takaya Saito,et al.  Precrec: fast and accurate precision–recall and ROC curve calculations in R , 2016, Bioinform..

[10]  M. Santibáñez,et al.  Predictors of Hospitalized Exacerbations and Mortality in Chronic Obstructive Pulmonary Disease , 2016, PloS one.

[11]  K. Saverno,et al.  COPD exacerbation frequency and its association with health care resource utilization and costs , 2015, International journal of chronic obstructive pulmonary disease.

[12]  D. Price,et al.  Predicting frequent COPD exacerbations using primary care data , 2015, International journal of chronic obstructive pulmonary disease.

[13]  Daniel Zelterman,et al.  Applied Multivariate Statistics with R , 2015, Statistics for Biology and Health.

[14]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[15]  Gary S Collins,et al.  Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration , 2015, Annals of Internal Medicine.

[16]  F. Rutten,et al.  Development and validation of a model to predict the risk of exacerbations in chronic obstructive pulmonary disease , 2013, International journal of chronic obstructive pulmonary disease.

[17]  F. Martinez,et al.  Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. , 2007, American journal of respiratory and critical care medicine.

[18]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[19]  T. To,et al.  Identifying Individuals with Physcian Diagnosed COPD in Health Administrative Databases , 2009, COPD.

[20]  C. Mathers,et al.  Projections of Global Mortality and Burden of Disease from 2002 to 2030 , 2006, PLoS medicine.

[21]  Ronen Feldman,et al.  The Data Mining and Knowledge Discovery Handbook , 2005 .

[22]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[23]  I. Stiell,et al.  Outpatient oral prednisone after emergency treatment of chronic obstructive pulmonary disease. , 2003, The New England journal of medicine.

[24]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.