Missing Categorical Data Imputation for FCM Clusterings of Mixed Incomplete Data

The Data mining is related to human congnitive ability, and one of popular method is fuzzy clustering. The focus of fuzzy c-means (FCM) clustering method is normally used on numerical data. However, most data existing in databases are both categorical and numerical. To date, clustering methods have been developed to analyze only complete data. Although we, sometimes, encounter data sets that contain one or more missing feature values (incomplete data) in data intensive classification systems, traditional clustering methods cannot be used for such data. Thus, we study this theme and discuss clustering methods that can handle mixed numerical and categorical incomplete data. In this paper, we propose some algorithms that use the missing categorical data imputation method and distances between numerical data that contain missing values. Finally, we show through a real data experiment that our proposed method is more effective than without imputation, when missing ratio becomes higher. Keywords-clustering; incomplete data; mixed data; FCM.

[1]  Qiang Wang,et al.  Missing categorical data imputation approach based on similarity , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[2]  H. Ichihashi,et al.  Fuzzy c-means clustering of mixed databases including numerical and nominal variables , 2004, IEEE Conference on Cybernetics and Intelligent Systems, 2004..

[3]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[4]  Salem Chakhar,et al.  Extension of Partitional Clustering Methods for Handling Mixed Data , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[5]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[6]  Miin-Shen Yang,et al.  Alternative c-means clustering algorithms , 2002, Pattern Recognit..

[7]  James C. Bezdek,et al.  Fuzzy c-means clustering of incomplete data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[8]  W. Peizhuang Pattern Recognition with Fuzzy Objective Function Algorithms (James C. Bezdek) , 1983 .