A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data

Partially missing data sets are a prevailing problem in clustering analysis. In this paper, missing attributes are represented as intervals, and a novel fuzzy c-means algorithm for incomplete data based on nearest-neighbor intervals is proposed. The algorithm estimates the nearest-neighbor interval representation of missing attributes by using the attribute distribution information of the data sets sufficiently, which can enhances the robustness of missing attribute imputation compared with other numerical imputation methods. Also, the convex hyper-polyhedrons formed by interval prototypes can present the uncertainty of missing attributes, and simultaneously reflect the shape of the clusters to some degree, which is helpful in enhancing the robustness of clustering analysis. Comparisons and analysis of the experimental results for several UCI data sets demonstrate the capability of the proposed algorithm.

[1]  Edgar Acuña,et al.  The Treatment of Missing Values and its Effect on Classifier Accuracy , 2004 .

[2]  James C. Bezdek,et al.  Fuzzy c-means clustering of incomplete data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[3]  Qiuming Zhu,et al.  A pseudo-nearest-neighbor approach for missing data recovery on Gaussian random data sets , 2002, Pattern Recognit. Lett..

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Hidetomo Ichihashi,et al.  Linear fuzzy clustering techniques with missing values and their application to local principal component analysis , 2004, IEEE Transactions on Fuzzy Systems.

[7]  Fan Zhi-ping A FCM clustering algorithm for multiple attribute information with interval numbers , 2004 .

[8]  James C. Bezdek,et al.  Optimization of clustering criteria by reformulation , 1995, IEEE Trans. Fuzzy Syst..

[9]  Chi-Chun Huang,et al.  A novel gray-based reduced NN classification method , 2006, Pattern Recognit..

[10]  James C. Bezdek,et al.  Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm , 2002, Pattern Recognit. Lett..

[11]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[12]  Rajesh N. Davé,et al.  Generalized fuzzy c-shells clustering and detection of circular and elliptical boundaries , 1992, Pattern Recognit..

[13]  Heiko Timm,et al.  Different approaches to fuzzy clustering of incomplete datasets , 2004, Int. J. Approx. Reason..

[14]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.