Predicting septic shock outcomes in a database with missing data using fuzzy modeling: Influence of pre-processing techniques on real-world data-based classification

Real-world databases often contain missing data and existing correction algorithms deliver varying performance. Also, most modeling techniques are not suitable to deal with them automatically. In this study we examine different approaches to predicting septic shock in the presence of missing data. Some preprocessing techniques for managing missing data include disregarding data, or replacing it with information that by design introduces bias. In this study, we show that predictive performance improves by employing a minimum pre-processing technique, the Zero-Order-Hold (ZOH) method, by applying a Fuzzy C-Means clustering technique based on the partial distance calculation strategy (FCM-PDS) and by computing the final classification regarding the samples from each patient. Performance improvements continue to occur where up to approximately 60% of the data is missing, though for higher percentage the classification performance still is statistically improved. We further validate this approach by making comparisons with previous studies.

[1]  João Miguel da Costa Sousa,et al.  Predicting Outcomes of Septic Shock Patients Using Feature Selection Based on Soft Computing Techniques , 2010, IPMU.

[2]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[3]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  Uzay Kaymak,et al.  A new approach to dealing with missing values in data-driven fuzzy modeling , 2010, International Conference on Fuzzy Systems.

[5]  Heiko Timm,et al.  Fuzzy cluster analysis with missing values , 1998, 1998 Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.98TH8353).

[6]  James C. Bezdek,et al.  Fuzzy c-means clustering of incomplete data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[7]  Heinz Schneider,et al.  Economic aspects of severe sepsis: a review of intensive care unit costs, cost of illness and cost effectiveness of therapy. , 2004, PharmacoEconomics.

[8]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[9]  Rüdiger W. Brause,et al.  Data quality aspects of a database for abdominal septic shock patients , 2004, Comput. Methods Programs Biomed..

[10]  Robert Babuska,et al.  Constructing fuzzy models by product space clustering , 1997 .

[11]  Lukasz A. Kurgan,et al.  Impact of imputation of missing values on classification error for discrete data , 2008, Pattern Recognit..

[12]  Russell V. Lenth,et al.  Statistical Analysis With Missing Data (2nd ed.) (Book) , 2004 .

[13]  Jürgen Paetz Knowledge-based approach to septic shock patient data using a neural network with trapezoidal activation functions , 2003, Artif. Intell. Medicine.

[14]  Heiko Timm,et al.  Differentiated Treatment of Missing Values in Fuzzy Clustering , 2003, IFSA.

[15]  Michael R. Berthold Fuzzy models and potential outliers , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).

[16]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[17]  John K. Dixon,et al.  Pattern Recognition with Partly Missing Data , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[18]  Roger C. Bone Definitions for Sepsis and Organ Failure , 1993 .