Mesh Based Clustering without Stopping Criterion

Clustering in data mining is a discovery process that groups a set of data such that the intra-cluster similarity is maximized and the inter-cluster similarity is minimized. Existing clustering algorithms, such as K-means, are designed to find clusters but these algorithms can break down if the choice of parameters in the static model is incorrect with respect to the data set being clustered, or if the model is not adequate to capture the characteristics of clusters. Furthermore, most of these algorithms break down when the data consists of clusters that are of diverse shapes, densities and sizes. In this paper, a novel clustering algorithm has been presented that clusters the data set in O(n) time taking O(n) space and that too without specifying the stopping criteria with respect to data set to be clustered (unlike done in k-means to explicitly specify the value of k). The algorithm first normalizes the data set, and then a proper mesh has to be designed to include the whole data set. Then all the points in a data set are assigned different box numbers and then these boxes are clustered instead of the real points. The algorithm doesn't use any distance measure to cluster points like Euclidean distance. Apart from independent clustering algorithm, it can be used for the upper bound of the "to be clusters" in other algorithms like k-means, isodata etc.