Initial seed selection for K-modes clustering - A distance and density based approach

Abstract Initial seed artefacts play a vital role in proper categorization of the given data set in partitioning based clustering algorithms. Hence, it is important to identify them. We propose a density with distance based method which ensures identification of seed artefacts from different clusters that leads to more accurate clustering results. Our algorithm improves on the search for initial seed artefacts iteratively until the minimum value of the sum of within sum errors, normalized by their data sizes, is ensured. This is because the initial artefacts are selected from different clusters. Here the choice of seed artefacts guarantees a global optimum clustering solution. We have compared our results with random, Wu, Cao and Khan’s methods of initial seed artefact selection, to show the efficacy of our method.