Robust clustering by pruning outliers

In many applications of C-means clustering, the given data set often contains noisy points. These noisy points will affect the resulting clusters, especially if they are far away from the data points. In this paper, we develop a pruning approach for robust C-means clustering. This approach identifies and prunes the outliers based on the sizes and shapes of the clusters so that the resulting clusters are least affected by the outliers. The pruning approach is general, and it can improve the robustness of many existing C-means clustering methods. In particular, we apply the pruning approach to improve the robustness of hard C-means clustering, fuzzy C-means clustering, and deterministic-annealing C-means clustering. As a result, we obtain three clustering algorithms that are the robust versions of the existing ones. In addition, we integrate the pruning approach with the fuzzy approach and the possibilistic approach to design two new algorithms for robust C-means clustering. The numerical results demonstrate that the pruning approach can achieve good robustness.

[1]  Rajesh N. Dave,et al.  Application of noise clustering in group technology , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).

[2]  Jun Wang A linear assignment clustering algorithm based on the least similar cluster representatives , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[3]  Kentaro Inui,et al.  Robust line fitting using LMedS clustering , 2003, Systems and Computers in Japan.

[4]  Uzay Kaymak,et al.  Fuzzy clustering with volume prototypes and adaptive cluster merging , 2002, IEEE Trans. Fuzzy Syst..

[5]  Rajesh N. Davé,et al.  Robust fuzzy clustering of relational data , 2002, IEEE Trans. Fuzzy Syst..

[6]  Tariq Samad,et al.  Designing Application-Specific Neural Networks Using the Genetic Algorithm , 1989, NIPS.

[7]  James C. Bezdek,et al.  Fuzzy mathematics in pattern classification , 1973 .

[8]  Yan Qiu Chen,et al.  A novel similarity measure for data clustering , 2000, Intell. Data Anal..

[9]  Allen Gersho,et al.  Competitive learning and soft competition for vector quantizer design , 1992, IEEE Trans. Signal Process..

[10]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[12]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[13]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[14]  Jeff Tian,et al.  Better Reliability Assessment and Prediction through Data Clustering , 2002, IEEE Trans. Software Eng..

[15]  R.N. Dave,et al.  Generalized noise clustering as a robust fuzzy c-M-estimators model , 1998, 1998 Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.98TH8353).

[16]  Etienne Barnard,et al.  Optimization for training neural nets , 1992, IEEE Trans. Neural Networks.

[17]  Patrik Wahlberg,et al.  Methods for robust clustering of epileptic EEG spikes , 2000, IEEE Transactions on Biomedical Engineering.

[18]  Rajesh N. Davé,et al.  Characterization and detection of noise in clustering , 1991, Pattern Recognit. Lett..

[19]  Sankar K. Pal,et al.  Fuzzy models for pattern recognition , 1992 .

[20]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[21]  James C. Bezdek,et al.  Fuzzy c-means clustering of incomplete data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[22]  James M. Keller,et al.  Fuzzy Models and Algorithms for Pattern Recognition and Image Processing , 1999 .

[23]  Kam-Fai Wong,et al.  A genetic algorithm-based clustering approach for database partitioning , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[24]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[26]  Magne Setnes,et al.  Fuzzy relational classifier trained by fuzzy clustering , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[27]  James C. Bezdek,et al.  Optimization of clustering criteria by reformulation , 1995, IEEE Trans. Fuzzy Syst..

[28]  Jung-Hua Wang,et al.  Self-organizing mountain method for clustering , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[29]  Hong Yan,et al.  Convergence condition and efficient implementation of the fuzzy curve-tracing (FCT) algorithm , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[30]  Hichem Frigui,et al.  A Robust Competitive Clustering Algorithm With Applications in Computer Vision , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Sankar K. Pal,et al.  Fuzzy models for pattern recognition : methods that search for structures in data , 1992 .

[32]  Benjamin W. Wah,et al.  Global Optimization for Neural Network Training , 1996, Computer.

[33]  Lawrence O. Hall,et al.  A generic knowledge-guided image segmentation and labeling system using fuzzy clustering algorithms , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[34]  J. C. Peters,et al.  Fuzzy Cluster Analysis : A New Method to Predict Future Cardiac Events in Patients With Positive Stress Tests , 1998 .

[35]  J. W. Lee,et al.  Advanced mountain clustering method , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[36]  Palma Blonda,et al.  A survey of fuzzy clustering algorithms for pattern recognition. II , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[37]  Subhash C. Kak,et al.  On Generalization by Neural Networks , 1998, Inf. Sci..

[38]  James C. Bezdek,et al.  Two soft relatives of learning vector quantization , 1995, Neural Networks.

[39]  E. E. Zhuk Robust Cluster Analysis of Discrete Multivariate Observations , 2001 .

[40]  Jiang-She Zhang,et al.  Improved possibilistic C-means clustering algorithms , 2004, IEEE Trans. Fuzzy Syst..

[41]  James M. Keller,et al.  Will the real iris data please stand up? , 1999, IEEE Trans. Fuzzy Syst..

[42]  Xinhua Zhuang,et al.  Gaussian mixture density modeling, decomposition, and applications , 1996, IEEE Trans. Image Process..

[43]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[44]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[45]  Ignacio Rojas,et al.  A New Clustering Technique for Function Approximation , 2005 .

[46]  Hong Yan Fuzzy curve-tracing algorithm , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[47]  Mauro Barni,et al.  Comments on "A possibilistic approach to clustering" , 1996, IEEE Trans. Fuzzy Syst..

[48]  James C. Bezdek,et al.  Repairs to GLVQ: a new family of competitive learning schemes , 1996, IEEE Trans. Neural Networks.

[49]  Rajesh N. Davé,et al.  Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..

[50]  Hidetomo Ichihashi,et al.  Linear fuzzy clustering based on least absolute deviations , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[51]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[52]  Reginald E. Hammah,et al.  Validity Measures for the Fuzzy Cluster Analysis of Orientations , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Jacob Barhen,et al.  TRUST: A deterministic algorithm for global optimization , 1997 .

[55]  Shy Shoham,et al.  Robust clustering by deterministic agglomeration EM of mixtures of multivariate t-distributions , 2002, Pattern Recognit..

[56]  Bedri C. Cetin,et al.  Terminal repeller unconstrained subenergy tunneling (trust) for fast global optimization , 1993 .

[57]  Geoffrey C. Fox,et al.  A deterministic annealing approach to clustering , 1990, Pattern Recognit. Lett..

[58]  James M. Keller,et al.  The possibilistic C-means algorithm: insights and recommendations , 1996, IEEE Trans. Fuzzy Syst..

[59]  Anil K. Jain,et al.  Clustering techniques: The user's dilemma , 1976, Pattern Recognit..

[60]  YUHUI YAO,et al.  Associative Clustering for Clusters of Arbitrary Distribution Shapes , 2001, Neural Processing Letters.

[61]  C. Charalambous,et al.  Non-linear minimax optimization as a sequence of least pth optimization with finite values of p , 1976 .

[62]  Kenneth G. Manton,et al.  Fuzzy Cluster Analysis , 2005 .