Personalized Mortality Prediction Driven by Electronic Medical Data and a Patient Similarity Metric

Background Clinical outcome prediction normally employs static, one-size-fits-all models that perform well for the average patient but are sub-optimal for individual patients with unique characteristics. In the era of digital healthcare, it is feasible to dynamically personalize decision support by identifying and analyzing similar past patients, in a way that is analogous to personalized product recommendation in e-commerce. Our objectives were: 1) to prove that analyzing only similar patients leads to better outcome prediction performance than analyzing all available patients, and 2) to characterize the trade-off between training data size and the degree of similarity between the training data and the index patient for whom prediction is to be made. Methods and Findings We deployed a cosine-similarity-based patient similarity metric (PSM) to an intensive care unit (ICU) database to identify patients that are most similar to each patient and subsequently to custom-build 30-day mortality prediction models. Rich clinical and administrative data from the first day in the ICU from 17,152 adult ICU admissions were analyzed. The results confirmed that using data from only a small subset of most similar patients for training improves predictive performance in comparison with using data from all available patients. The results also showed that when too few similar patients are used for training, predictive performance degrades due to the effects of small sample sizes. Our PSM-based approach outperformed well-known ICU severity of illness scores. Although the improved prediction performance is achieved at the cost of increased computational burden, Big Data technologies can help realize personalized data-driven decision support at the point of care. Conclusions The present study provides crucial empirical evidence for the promising potential of personalized data-driven decision support systems. With the increasing adoption of electronic medical record (EMR) systems, our novel medical data analytics contributes to meaningful use of EMR data.

[1]  P. Ashworth In the intensive care unit. , 1978, Nursing mirror.

[2]  D. E. Lawrence,et al.  APACHE—acute physiology and chronic health evaluation: a physiologically based classification system , 1981, Critical care medicine.

[3]  J. L. Gall,et al.  SIMPLIFIED ACUTE PHYSIOLOGICAL SCORE FOR INTENSIVE CARE PATIENTS , 1983, The Lancet.

[4]  David J. Hand,et al.  Statistical Classification Methods in Consumer Credit Scoring: a Review , 1997 .

[5]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[6]  D C Angus,et al.  Caring for the critically ill patient. Current and projected workforce requirements for care of the critically ill and patients with pulmonary disease: can we meet the requirements of an aging population? , 2000, JAMA.

[7]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[8]  Raymond J. Mooney,et al.  Text mining with information extraction , 2004 .

[9]  J. Vincent,et al.  The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure , 1996, Intensive Care Medicine.

[10]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[12]  Niels Peek,et al.  The impact of different prognostic models and their customization on institutional comparison of intensive care units* , 2007, Critical care medicine.

[13]  T. H. Kyaw,et al.  Multiparameter Intelligent Monitoring in Intensive Care II: A public-access intensive care unit database* , 2011, Critical care medicine.

[14]  M. Saeed Multiparameter Intelligent Monitoring in Intensive Care II ( MIMIC-II ) : A public-access intensive care unit database , 2011 .

[15]  Leo Anthony Celi,et al.  A Database-driven Decision Support System: Customized Mortality Prediction , 2012, Journal of personalized medicine.

[16]  Fei Wang,et al.  Supervised patient similarity measure of heterogeneous patient records , 2012, SKDD.

[17]  David J. Stone,et al.  "Big data" in the intensive care unit. Closing the data loop. , 2013, American journal of respiratory and critical care medicine.

[18]  S. Schneeweiss Learning from big health care data. , 2014, The New England journal of medicine.

[19]  J. Zimmerman,et al.  Comparing Observed and Predicted Mortality Among ICUs Using Different Prognostic Systems: Why Do Performance Assessments Differ?* , 2015, Critical care medicine.

[20]  Athanasios V. Vasilakos,et al.  Big data: From beginning to future , 2016, Int. J. Inf. Manag..

[21]  Andrew James,et al.  Big Data in the Intensive Care Unit , 2017, AMIA.