A dynamic ensemble approach to robust classification in the presence of missing data

Many real-world datasets suffer from missing or incomplete data. In the healthcare setting, for example, certain patient measurement parameters, such as vitals and/or lab values, may be missing due to insufficient monitoring. When present, however, these features could be highly discriminative in predicting aspects of patient state. Therefore, it is desirable to incorporate these sparsely measured features into a predictive model. Training predictive algorithms on such datasets is complicated by the missing data. Overcoming this problem is usually achieved by first estimating values for the missing data, which is referred to as data imputation. Without strong prior knowledge about the relationship between features though, it is common to fill in missing values with their respective population mean or median. The accuracy of this approach is limited, however, and may simply inject noise into the data. We propose a two-stage machine learning algorithm that learns a dynamic classifier ensemble from an incomplete dataset without data imputation. The algorithm is very simple to implement and applicable across a wide range of problems. Our method first employs a variant of AdaBoost to learn a set of low-dimensional classifiers, each of which abstains from predicting if its dependent feature(s) are missing. Our novel contribution is the secondary dynamic ensemble learning stage in which the low-dimensional classifiers are combined using a dynamic weighting that depends on the pattern of measured features in the present input data. This allows the model to be resilient to missing data by adjusting the strength of certain classifiers to account for missing features. We apply our algorithm to early detection of hemodynamic instability in ICU patients. Providing an effective risk score of hemodynamic instability has the potential to give the clinician sufficient time to intervene, thereby reducing the chance of organ damage due to insufficient blood perfusion. We compare the results of our algorithm to other common missing data approaches, including mean imputation and multiple imputation methods, and discuss the advantages of the approach given the constraints of the application domain (e.g., high specificity to combat hospital alarm fatigue).

[1]  O. Badawi,et al.  The eICU Research Institute - A Collaboration Between Industry, Health-Care Providers, and Academia , 2010, IEEE Engineering in Medicine and Biology Magazine.

[2]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[3]  J Andrade,et al.  Inflammatory peritoneal cell profile in distinct models of peritoneal injury , 2001, Critical Care.

[4]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[5]  A. Grodzinsky,et al.  Skeletal Tissue Electromechanics & Electrical Stimulation of Growth & Remodeling , 1983, IEEE Engineering in Medicine and Biology Magazine.

[6]  Joseph Sill,et al.  Feature-Weighted Linear Stacking , 2009, ArXiv.

[7]  L. Guibas,et al.  The Earth Mover''s Distance: Lower Bounds and Invariance under Translation , 1997 .

[8]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[9]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[10]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[11]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[12]  Bs Marshall B. Dunning,et al.  Comprar Manual of Laboratory and Diagnostic Tests, 8/e for PDA | Marshall B. Dunning III, BS, MS, PhD | 9780781790222 | Lippincott Williams & Wilkins , 2008 .

[13]  Michael Werman,et al.  A Unified Approach to the Change of Resolution: Space and Gray-Level , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Gary King,et al.  Amelia II: A Program for Missing Data , 2011 .

[15]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[16]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[17]  Michael Defoin-Platel,et al.  Handling Missing Features with Boosting Algorithms for Protein-Protein Interaction Prediction , 2010, DILS.

[18]  Marshall Barnett Dunning,et al.  A Manual of Laboratory and Diagnostic Tests , 1980 .

[19]  Jin Xiong Lian Interpreting and using the arterial blood gas analysis , 2010 .

[20]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent in Function Space , 2007 .