Robust refinement of initial prototypes for partitioning-based clustering algorithms

Non-uniqueness of solutions and sensitivity to erroneous data are common problems to large-scale data clustering tasks. In order to avoid poor quality of solutions with partitioning-based clustering methods, robust estimates (that are highly insensitive to erroneous data values) are needed and initial cluster prototypes should be determined properly. In this paper, a robust density estimation initialization method that exploits the spatial median estimate to the prototype update is presented. Besides being insensitive to noise and outliers, the new method is also computationally comparable with other traditional methods. The methods are compared by numerical experiments on a set of synthetic and real-world data sets. Conclusions and discussion on the results are given.

[1]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[2]  T. Kärkkäinen,et al.  Robust Clustering Methods For Incomplete AndErroneous Data , 2004 .

[3]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[4]  Man Lan,et al.  Initialization of cluster refinement algorithms: a review and comparative study , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[7]  C.-C. Jay Kuo,et al.  A new initialization technique for generalized Lloyd iteration , 1994, IEEE Signal Processing Letters.

[8]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[9]  T. Kärkkäinen,et al.  ON COMPUTATION OF SPATIAL MEDIAN FOR ROBUST DATA MINING , 2022 .

[10]  Tommi Kärkkäinen,et al.  Robust Formulations for Training Multilayer Perceptrons , 2004, Neural Computation.

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .