Linear Interpolation-Based Fuzzy Clustering Approach for Missing Data Handling

Clustering of incomplete data set containing missing values is a common problem in the literature. Methods to handle this problem have vast variations, including several imputation as well as non-imputation techniques for clustering. In this work, we have described the analysis of different approaches explored for handling missing data in clustering. The aim of this paper is to compare several FCM clustering approaches based on imputation and non-imputation strategies. Experimental results on one artificial and four real-world data sets from UCI repository show that linear interpolation-based FCM clustering approach performs significantly better than other techniques for these data sets.

[1]  Vadlamani Ravi,et al.  Data imputation via evolutionary computation, clustering and a neural network , 2015, Neurocomputing.

[2]  Nuryazmin Ahmat Zainuri,et al.  A comparison of various imputation methods for missing values in air quality data , 2015 .

[3]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[4]  Mickael Guedj,et al.  A Comparison of Six Methods for Missing Data Imputation , 2015 .

[5]  Katherine J. Lee,et al.  Multiple imputation for handling missing outcome data when estimating the relative risk , 2017, BMC Medical Research Methodology.

[6]  James C. Bezdek,et al.  Fuzzy c-means clustering of incomplete data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[7]  Ilan Shimshoni,et al.  Mean Shift Clustering Algorithm for Data with Missing Values , 2014, DaWaK.

[8]  Nor Azam Ramli,et al.  Comparison of Linear Interpolation Method and Mean Method to Replace the Missing Values in Environmental Data Set , 2014 .

[9]  Heiko Timm,et al.  Different approaches to fuzzy clustering of incomplete datasets , 2004, Int. J. Approx. Reason..

[10]  J. Daurès,et al.  Prostate cancer: net survival and cause-specific survival rates after multiple imputation , 2015, BMC Medical Research Methodology.

[11]  Quan Pan,et al.  Adaptive imputation of missing values for incomplete pattern classification , 2016, Pattern Recognit..

[12]  R. van de Schoot,et al.  How to handle missing data: A comparison of different approaches , 2015 .

[13]  Stefan Conrad,et al.  Fuzzy Clustering of Incomplete Data Based on Cluster Dispersion , 2010, IPMU.

[14]  John K. Dixon,et al.  Pattern Recognition with Partly Missing Data , 1979, IEEE Transactions on Systems, Man, and Cybernetics.