Clustering aggregation based on genetic algorithm for documents clustering

Clustering aggregation problem is a kind of formal description for clustering ensemble problem and technologies for the solving of clustering aggregation problem can be used to construct clustering division with better clustering performance when the clustering performances of each original clustering division are fluctuant or weak. In this paper, an approach based on genetic algorithm for clustering aggregation problem, named as GeneticCA, is presented To estimate the clustering performance of a clustering division, clustering precision is defined and features of clustering precision are discussed In our experiments about clustering performances of GeneticCA for document clustering, hamming neural network is used to construct clustering divisions with fluctuant and weak clustering performances. Experimental results show that the clustering performance of clustering division constructed by GeneticCA is better than clustering performance of original clustering divisions with clustering precision as criterion.

[1]  Venkatesan Guruswami,et al.  Clustering with qualitative information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[2]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[3]  Mari Ostendorf,et al.  Combining Multiple Clustering Systems , 2004, PKDD.

[4]  Renée J. Miller,et al.  LIMBO: Scalable Clustering of Categorical Data , 2004, EDBT.

[5]  Ana L. N. Fred,et al.  Analysis of consensus partition in cluster ensemble , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[6]  Zhou Zhihua,et al.  Bagging-Based Selective Clusterer Ensemble , 2005 .

[7]  Mohamed S. Kamel,et al.  An aggregated clustering approach using multi-ant colonies algorithms , 2006, Pattern Recognit..