A New Insight Into Missing Data in Intensive Care Unit Patient Profiles: Observational Study

Background The data missing from patient profiles in intensive care units (ICUs) are substantial and unavoidable. However, this incompleteness is not always random or because of imperfections in the data collection process. Objective This study aimed to investigate the potential hidden information in data missing from electronic health records (EHRs) in an ICU and examine whether the presence or missingness of a variable itself can convey information about the patient health status. Methods Daily retrieval of laboratory test (LT) measurements from the Medical Information Mart for Intensive Care III database was set as our reference for defining complete patient profiles. Missingness indicators were introduced as a way of representing presence or absence of the LTs in a patient profile. Thereafter, various feature selection methods (filter and embedded feature selection methods) were used to examine the predictive power of missingness indicators. Finally, a set of well-known prediction models (logistic regression [LR], decision tree, and random forest) were used to evaluate whether the absence status itself of a variable recording can provide predictive power. We also examined the utility of missingness indicators in improving predictive performance when used with observed laboratory measurements as model input. The outcome of interest was in-hospital mortality and mortality at 30 days after ICU discharge. Results Regardless of mortality type or ICU day, more than 40% of the predictors selected by feature selection methods were missingness indicators. Notably, employing missingness indicators as the only predictors achieved reasonable mortality prediction on all days and for all mortality types (for instance, in 30-day mortality prediction with LR, we achieved area under the curve of the receiver operating characteristic [AUROC] of 0.6836±0.012). Including indicators with observed measurements in the prediction models also improved the AUROC; the maximum improvement was 0.0426. Indicators also improved the AUROC for Simplified Acute Physiology Score II model—a well-known ICU severity of illness score—confirming the additive information of the indicators (AUROC of 0.8045±0.0109 for 30-day mortality prediction for LR). Conclusions Our study demonstrated that the presence or absence of LT measurements is informative and can be considered a potential predictor of in-hospital and 30-day mortality. The comparative analysis of prediction models also showed statistically significant prediction improvement when indicators were included. Moreover, missing data might reflect the opinions of examining clinicians. Therefore, the absence of measurements can be informative in ICUs and has predictive power beyond the measured data themselves. This initial case study shows promise for more in-depth analysis of missing data and its informativeness in ICUs. Future studies are needed to generalize these results.

[1]  Teresa A. Myers Goodbye, Listwise Deletion: Presenting Hot Deck Imputation as an Easy and Effective Tool for Handling Missing Data , 2011 .

[2]  N. Sato,et al.  Relation between elevated blood urea nitrogen, clinical features or comorbidities, and clinical outcome in patients hospitalized for acute heart failure syndromes. , 2015, International journal of cardiology.

[3]  C. Mackenzie,et al.  A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. , 1987, Journal of chronic diseases.

[4]  Ralf Klinkenberg,et al.  Data Classification: Algorithms and Applications , 2014 .

[5]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[6]  David Ring,et al.  The Elixhauser Comorbidity Method Outperforms the Charlson Index in Predicting Inpatient Death After Orthopaedic Surgery , 2014, Clinical orthopaedics and related research.

[7]  N. Cook Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction , 2007, Circulation.

[8]  D. Russell,et al.  Missing data: a review of current methods and applications in epidemiological research , 2004 .

[9]  S. Simon,et al.  Quality and correlates of medical record documentation in the ambulatory care setting , 2002, BMC health services research.

[10]  Soyeon Ahn,et al.  Serum anion gap is predictive of mortality in an elderly population , 2014, Experimental Gerontology.

[11]  Myrah R. Stockdale,et al.  Missing data as a validity threat for medical and healthcare education research: Problems and solutions , 2016 .

[12]  J. Panza,et al.  The initial anion gap is a predictor of mortality in acute myocardial infarction , 2006, Coronary artery disease.

[13]  Anis Sharafoddini,et al.  Patient Similarity in Prediction Models Based on Health Data: A Scoping Review , 2017, JMIR medical informatics.

[14]  Kitty S. Chan,et al.  Review: Electronic Health Records and the Reliability and Validity of Quality Measures: A Review of the Literature , 2010, Medical care research and review : MCRR.

[15]  S. Okello,et al.  Frequency of Vital Signs Monitoring and its Association with Mortality among Adults with Severe Sepsis Admitted to a General Medical Ward in Uganda , 2014, PloS one.

[16]  Chunhua Weng,et al.  Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research , 2013, J. Am. Medical Informatics Assoc..

[17]  Helen Ogden,et al.  Research Paper: Data Quality of General Practice Electronic Health Records: The Impact of a Program of Assessments, Feedback, and Training , 2004, J. Am. Medical Informatics Assoc..

[18]  W. Tierney,et al.  Multiple imputation in public health research , 2001, Statistics in medicine.

[19]  C.J.H. Mann,et al.  Clinical Prediction Models: A Practical Approach to Development, Validation and Updating , 2009 .

[20]  G. Hartvigsen,et al.  Secondary Use of EHR: Data Quality Issues and Informatics Opportunities , 2010, Summit on translational bioinformatics.

[21]  G. Vetrovec,et al.  Relation of blood urea nitrogen to long-term mortality in patients with heart failure. , 2008, The American journal of cardiology.

[22]  J. Wennberg,et al.  Dealing with medical practice variations: a proposal for action. , 1984, Health affairs.

[23]  C. Steiner,et al.  Comorbidity measures for use with administrative data. , 1998, Medical care.

[24]  Craig K. Enders,et al.  An introduction to modern missing data analyses. , 2010, Journal of school psychology.

[25]  P. Rothwell,et al.  External validity of randomised controlled trials: “To whom do the results of this trial apply?” , 2005, The Lancet.

[26]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[27]  A Rogier T Donders,et al.  Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: an empirical example. , 2010, Journal of clinical epidemiology.

[28]  S. Bandinelli,et al.  Red cell distribution width and mortality in older adults: a meta-analysis. , 2010, The journals of gerontology. Series A, Biological sciences and medical sciences.

[29]  M H Liang,et al.  Techniques to improve physicians' use of diagnostic tests: a new conceptual framework. , 1998, JAMA.

[30]  P. C. Tang,et al.  Research Paper: Use of Computer-based Records, Completeness of Documentation, and Appropriateness of Documented Clinical Decisions , 1999, J. Am. Medical Informatics Assoc..

[31]  M. Tez,et al.  Red cell distribution width as a predictor of mortality in acute pancreatitis. , 2013, The American journal of emergency medicine.

[32]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[33]  Jonathan P. Weiner,et al.  Implementing Electronic Health Record-Based Quality Measures for Developmental Screening , 2009, Pediatrics.

[34]  B. Wells,et al.  Strategies for Handling Missing Data in Electronic Health Record Derived Data , 2013, EGEMS.

[35]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[36]  Joon Lee,et al.  Using information theory to identify redundancy in common laboratory tests in the intensive care unit , 2015, BMC Medical Informatics and Decision Making.

[37]  A. Bottle,et al.  Systematic Review of Comorbidity Indices for Administrative Data , 2012, Medical care.

[38]  I. Kohane,et al.  Biases in electronic health record data due to processes within the healthcare system: retrospective observational study , 2018, British Medical Journal.

[39]  A. Janssens,et al.  Small improvement in the area under the receiver operating characteristic curve indicated small changes in predicted risks. , 2016, Journal of clinical epidemiology.

[40]  S. Lemeshow,et al.  A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study , 1993 .

[41]  Karel G M Moons,et al.  Missing covariate data in clinical research: when and when not to use the missing-indicator method for analysis , 2012, Canadian Medical Association Journal.

[42]  M. Pencina,et al.  Interpreting incremental value of markers added to risk prediction models. , 2012, American journal of epidemiology.

[43]  Xiao Xu,et al.  Lactate Clearance Is a Useful Biomarker for the Prediction of All-Cause Mortality in Critically Ill Patients: A Systematic Review and Meta-Analysis* , 2014, Critical care medicine.

[44]  P. Roth MISSING DATA: A CONCEPTUAL REVIEW FOR APPLIED PSYCHOLOGISTS , 1994 .

[45]  Kevin T. Beier,et al.  Elevation of blood urea nitrogen is predictive of long-term mortality in critically ill patients independent of “normal” creatinine* , 2011, Critical care medicine.

[46]  S. A. Schroeder,et al.  Variation among Physicians in Use of Laboratory Tests II. Relation to Clinical Productivity and Outcomes of Care , 1977, Medical care.

[47]  Daniel Fabbri,et al.  Evaluating EHR Data Availability for Cohort Selection in Retrospective Studies , 2016, 2016 IEEE International Conference on Healthcare Informatics (ICHI).

[48]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[49]  Matthias Egger,et al.  Electronic medical record systems, data quality and loss to follow-up: survey of antiretroviral therapy programmes in resource-limited settings. , 2008, Bulletin of the World Health Organization.

[50]  R. Little,et al.  The prevention and treatment of missing data in clinical trials. , 2012, The New England journal of medicine.

[51]  M. Levy,et al.  Reducing Mortality in Severe Sepsis and Septic Shock , 2011, Seminars in respiratory and critical care medicine.

[52]  Craig D Newgard,et al.  Advanced Statistics: Missing Data in Clinical Research—Part 1: An Introduction and Conceptual Framework , 2007, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[53]  Shuang Wang,et al.  Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research , 2014, BMC Medical Informatics and Decision Making.

[54]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[55]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[56]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  João Miguel da Costa Sousa,et al.  Reducing unnecessary lab testing in the ICU with artificial intelligence , 2013, Int. J. Medical Informatics.

[58]  Joon Lee,et al.  Finding Similar Patient Subpopulations in the ICU Using Laboratory Test Ordering Patterns , 2018, ICBBS '18.

[59]  S. de Lusignan,et al.  Have the completeness and accuracy of computer medical records in general practice improved in the last five years? The report of a two-practice pilot study , 1999 .

[60]  A. Hedley,et al.  A computer in the diabetic clinic. Completeness of data in a clinical information system for diabetes , 1986 .

[61]  Elie Azoulay,et al.  Reporting and handling missing values in clinical studies in intensive care units , 2013, Intensive Care Medicine.

[62]  J. Farley,et al.  A comparison of comorbidity measurements to predict healthcare expenditures. , 2006, The American journal of managed care.

[63]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[64]  Craig D Newgard,et al.  Advanced Statistics: Missing Data in Clinical Research—Part 2: Multiple Imputation , 2007, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[65]  Hude Quan,et al.  Comparison of the Elixhauser and Charlson/Deyo Methods of Comorbidity Measurement in Administrative Data , 2004, Medical care.

[66]  D. Doyle,et al.  American Society of Anesthesiologists Classification (ASA Class) , 2019 .

[67]  M Pringle,et al.  Assessment of the completeness and accuracy of computer medical records in four practices committed to recording data on computer. , 1995, The British journal of general practice : the journal of the Royal College of General Practitioners.

[68]  R. Little Missing-Data Adjustments in Large Surveys , 1988 .

[69]  S. Lemeshow,et al.  A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study. , 1993, JAMA.

[70]  M. V. Oliveira,et al.  Prevalence of unnecessary laboratory tests and related avoidable costs in intensive care unit , 2014 .

[71]  Therese D. Pigott,et al.  A Review of Methods for Missing Data , 2001 .

[72]  Serum anion gap at admission as a predictor of mortality in the pediatric intensive care unit , 2017, Scientific Reports.

[73]  C. Schulman,et al.  Standards for frequency of measurement and documentation of vital signs and physical assessments. , 2010, Critical care nurse.

[74]  C. McKane,et al.  The Association of Red Cell Distribution Width at Hospital Discharge and Out-of-Hospital Mortality Following Critical Illness* , 2014, Critical care medicine.

[75]  D. Cha,et al.  Serum Anion Gap Predicts All-Cause Mortality in Patients with Advanced Chronic Kidney Disease: A Retrospective Analysis of a Randomized Controlled Study , 2016, PloS one.

[76]  G. Molenberghs Applied Longitudinal Analysis , 2005 .

[77]  Magnolia Cardona-Morrell,et al.  Vital Signs: From Monitoring to Prevention of Deterioration in General Wards , 2015 .

[78]  George Hripcsak,et al.  Defining and measuring completeness of electronic health records for secondary use , 2013, J. Biomed. Informatics.

[79]  Sabina Hunziker,et al.  Red cell distribution width improves the simplified acute physiology score for risk prediction in unselected critically ill patients , 2012, Critical Care.

[80]  Chunhua Weng,et al.  Sick Patients Have More Data: The Non-Random Completeness of Electronic Health Records , 2013, AMIA.

[81]  Avinash A. K. Math,et al.  A review on laboratory liver function tests , 2009, The Pan African medical journal.

[82]  Roderick J A Little,et al.  A Review of Hot Deck Imputation for Survey Non‐response , 2010, International statistical review = Revue internationale de statistique.

[83]  Frequency of Laboratory Test Utilization in the Intensive Care Unit and its Implications for Large Scale Data Collection Efforts , 2003, AMIA.

[84]  Stef van Buuren,et al.  Flexible Imputation of Missing Data, Second Edition , 2018 .

[85]  M. Gorelick,et al.  Bias arising from missing data in predictive models. , 2006, Journal of clinical epidemiology.

[86]  K. Christopher,et al.  Red cell distribution width and all-cause mortality in critically ill patients* , 2011, Critical care medicine.