论文信息 - Clustering validity based on the most similarity

Clustering validity based on the most similarity

One basic requirement of many studies is the necessity of classifying data. Clustering is a proposed method for summarizing networks. Clustering methods can be divided into two categories named model-based approaches and algorithmic approaches. Since the most of clustering methods depend on their input parameters, it is important to evaluate the result of a clustering algorithm with its different input parameters, to choose the most appropriate one. There are several clustering validity techniques based on inner density and outer density of clusters that represent different metrics to choose the most appropriate clustering independent of the input parameters. According to dependency of previous methods on the input parameters, one challenge in facing with large systems, is to complete data incrementally that effects on the final choice of the most appropriate clustering. Those methods define the existence of high intensity in a cluster, and low intensity among different clusters as the measure of choosing the optimal clustering. This measure has a tremendous problem, not availing all data at the first stage. In this paper, we introduce an efficient measure in which maximum number of repetitions for various initial values occurs.

[1] Michalis Vazirgiannis,et al. Clustering validity assessment: finding the optimal partitioning of a data set , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[2] Chuangxin Yang,et al. A Clustering Algorithm for Weighted Graph Based on Minimum Cut , 2008, 2008 First International Conference on Intelligent Networks and Intelligent Systems.

[3] Charu C. Aggarwal,et al. Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[4] Michalis Vazirgiannis,et al. Quality Scheme Assessment in the Clustering Process , 2000, PKDD.

[5] M. Newman,et al. Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6] Csaba Legány,et al. Cluster validity measurement techniques , 2006 .

[7] J. Dunn. Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[8] Franck Picard,et al. A mixture model for random graphs , 2008, Stat. Comput..