In the modern intensive care unit (ICU), the physiologic state of critically-ill patients is monitored through a diverse array of biosensors and laboratory measurements. The sheer volume of data that is collected has overwhelmed clinicians charged with assimilating and transforming the data into clinical hypotheses. The development of automated algorithms with vigilant monitoring and clinical decision-support capabilities would help to alleviate this "information-overload" challenge. The inherent noise and measurement error is an added level of complication to the real-time analysis and interpretation of medical data. One class of "noise" in medical data can be characterized by the absence or unavailability of a desired measurement. We have analyzed a large collection of clinical laboratory data (blood chemistry, blood gasses, complete blood counts) from over 600 ICU/CCU patients in the MIMIC II database. An analysis of the frequency of missing data values across patient records for each measurement was completed. Furthermore, we have developed a novel method to estimate the values of missing data by the use of a weighted K-nearest neighbors algorithm. We propose a weighting scheme that exploits the correlation between a "missing" dimension and available data values from other fields. We compare our technique with several popular missing value estimation techniques: principal components analysis, least squares estimation, mean imputation, and classical k-nearest neighbors. The mean standardized imputation error ranges from a minimum of 0.31 to a maximum, of 0.75 depending on the imputed dimension. The mean standardized imputation error over all dimensions is 0.45.
[1]
Trevor Hastie,et al.
Imputing Missing Data for Gene Expression Arrays
,
2001
.
[2]
Russ B. Altman,et al.
Missing value estimation methods for DNA microarrays
,
2001,
Bioinform..
[3]
I K Fodor,et al.
A Survey of Dimension Reduction Techniques
,
2002
.
[4]
R G Mark,et al.
MIMIC II: a massive temporal ICU patient database to support research in intelligent patient monitoring
,
2002,
Computers in Cardiology.
[5]
Adriana Pérez,et al.
Use of the mean, hot deck and multiple imputation techniques to predict outcome in intensive care unit patients in Colombia
,
2002,
Statistics in medicine.
[6]
Monique Frize,et al.
Validation of a hybrid approach for imputing missing data
,
2003,
Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).
[7]
Melissa M. Farmer,et al.
Comparison of Two Multiple Imputation Procedures in a Cancer Screening Survey
,
2021,
Journal of Data Science.