论文信息 - Mesh Based Clustering without Stopping Criterion

Mesh Based Clustering without Stopping Criterion

Clustering in data mining is a discovery process that groups a set of data such that the intra-cluster similarity is maximized and the inter-cluster similarity is minimized. Existing clustering algorithms, such as K-means, are designed to find clusters but these algorithms can break down if the choice of parameters in the static model is incorrect with respect to the data set being clustered, or if the model is not adequate to capture the characteristics of clusters. Furthermore, most of these algorithms break down when the data consists of clusters that are of diverse shapes, densities and sizes. In this paper, a novel clustering algorithm has been presented that clusters the data set in O(n) time taking O(n) space and that too without specifying the stopping criteria with respect to data set to be clustered (unlike done in k-means to explicitly specify the value of k). The algorithm first normalizes the data set, and then a proper mesh has to be designed to include the whole data set. Then all the points in a data set are assigned different box numbers and then these boxes are clustered instead of the real points. The algorithm doesn't use any distance measure to cluster points like Euclidean distance. Apart from independent clustering algorithm, it can be used for the upper bound of the "to be clusters" in other algorithms like k-means, isodata etc.

[1] Philip S. Yu,et al. Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[2] P. Arabie,et al. Cluster analysis in marketing research , 1994 .

[3] Hichem Frigui,et al. A Robust Competitive Clustering Algorithm With Applications in Computer Vision , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[4] Shivakumar Vaithyanathan,et al. Generalized Model Selection for Unsupervised Learning in High Dimensions , 1999, NIPS.

[5] G. W. Hatfield,et al. DNA microarrays and gene expression , 2002 .

[6] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7] G. Krishna,et al. Agglomerative clustering using the concept of mutual nearest neighbourhood , 1978, Pattern Recognit..

[8] Lawrence Hubert,et al. Advances in Cluster Analysis Relevant to Marketing Research , 1996 .

[9] Vijay V. Raghavan,et al. Genetic Algorithm for Clustering with an Ordered Representation , 1991, ICGA.

[10] Jon Louis Bentley,et al. Fast Algorithms for Constructing Minimal Spanning Trees in Coordinate Spaces , 1978, IEEE Transactions on Computers.

[11] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .

[12] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.