Different approaches to fuzzy clustering of incomplete datasets

Partially missing datasets are a prevailing problem in data analysis. Since several reasons for missing attribute values can be distinguished, we suggest different approaches for dealing with this common problem. For datasets, in which feature values are missing completely at random, a variety of approaches has been proposed. In other situations, however, the fact that values are missing provides additional information for the classification of the dataset. Since the known approaches cannot exploit this information, we developed an extension of the Gath and Geva algorithm that can utilize it. We introduce a class-specific probability for missing values in order to appropriately assign incomplete data points to clusters. Benchmark datasets are used to demonstrate the capability of the presented approach.

[1]  James C. Bezdek,et al.  Fuzzy c-means clustering of incomplete data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[2]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Sankar K. Pal,et al.  Fuzzy models for pattern recognition : methods that search for structures in data , 1992 .

[4]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[5]  John K. Dixon,et al.  Pattern Recognition with Partly Missing Data , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[7]  James M. Keller,et al.  Fuzzy Models and Algorithms for Pattern Recognition and Image Processing , 1999 .

[8]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[9]  J. C. Peters,et al.  Fuzzy Cluster Analysis : A New Method to Predict Future Cardiac Events in Patients With Positive Stress Tests , 1998 .

[10]  Heiko Timm,et al.  Fuzzy cluster analysis with missing values , 1998, 1998 Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.98TH8353).

[11]  H. Timm,et al.  Di erent Approaches for Fuzzy Cluster Analysis with Missing Values , 1999 .