论文信息 - Looking Beyond Historical Patient Outcomes to Improve Clinical Models

Looking Beyond Historical Patient Outcomes to Improve Clinical Models

Clinical models can be improved by decreasing the importance assigned to fitting historical patient outcomes in often small and imperfectly characterized derivation cohorts. When Less Is More Clinical models play an important role in guiding patient care at the bedside, improving our understanding of diseases, and performing objective assessments of healthcare quality. The typical approach to developing these models places great importance on fitting historical patient outcomes in derivation data sets (such as those obtained from clinical studies or patient registries). However, for a fairly broad range of medical applications, these derivation data sets may be small after accounting for inclusionary and exclusionary criteria, and additionally may be imperfectly characterized due to noise and variations in the rates of patient outcomes. Collecting more data offers one approach to address this issue, but is challenging due to the costs and complexity of increasing the size of clinical cohorts. In the setting of small and imperfectly characterized data sets, approaches to developing clinical models that rely exclusively on fitting historical patient outcomes suffer from the implicit assumption that the derivation data sets are representative. Instead, as the new study by Chia et al. explores, the process of developing clinical models can be improved by decreasing the importance placed on fitting historical patient outcomes, and by supplementing these models with information about the extent to which patients differ from the statistical distribution of clinical characteristics within the derivation data set. When evaluated using data from three different clinical applications [patients with acute coronary syndrome enrolled in the DISPERSE2-TIMI33 and MERLIN-TIMI36 trials, patients undergoing inpatient surgery in the National Surgical Quality Improvement Program (NSQIP) registry, and patients undergoing percutaneous coronary intervention in the Blue Cross Blue Shield of Michigan Cardiovascular Consortium (BMC2) registry], this approach of treating derivation data for clinical models as simultaneously labeled and unlabeled consistently improved discrimination between high- and low-risk patients according to different statistical metrics. The idea of decreasing the importance assigned to fitting historical outcomes allows for better clinical models, and ultimately for improvements in the use of these models to study diseases, choose therapies, or evaluate healthcare providers. Conventional algorithms for modeling clinical events focus on characterizing the differences between patients with varying outcomes in historical data sets used for the model derivation. For many clinical conditions with low prevalence and where small data sets are available, this approach to developing models is challenging due to the limited number of positive (that is, event) examples available for model training. Here, we investigate how the approach of developing clinical models might be improved across three distinct patient populations (patients with acute coronary syndrome enrolled in the DISPERSE2-TIMI33 and MERLIN-TIMI36 trials, patients undergoing inpatient surgery in the National Surgical Quality Improvement Program registry, and patients undergoing percutaneous coronary intervention in the Blue Cross Blue Shield of Michigan Cardiovascular Consortium registry). For each of these cases, we supplement an incomplete characterization of patient outcomes in the derivation data set (uncensored view of the data) with an additional characterization of the extent to which patients differ from the statistical support of their clinical characteristics (censored view of the data). Our approach exploits the same training data within the derivation cohort in multiple ways to improve the accuracy of prediction. We position this approach within the context of traditional supervised (2-class) and unsupervised (1-class) learning methods and present a 1.5-class approach for clinical decision-making. We describe a 1.5-class support vector machine (SVM) classification algorithm that implements this approach, and report on its performance relative to logistic regression and 2-class SVM classification with cost-sensitive weighting and oversampling. The 1.5-class SVM algorithm improved prediction accuracy relative to other approaches and may have value in predicting clinical events both at the bedside and for risk-adjusted quality of care assessment.