Multigranulation information fusion: A dempster-shafer evidence theory based clustering ensemble method

As an important reflection of human cognitive ability, the multi-granulation analysis gets more reasonable solution of a problem in comparison to the single granulation. Clustering analysis is an active area of machine learning and a fundamental technique of information granulation. By using different clustering algorithms and different parameters of an algorithm, a data set can be granulated into multiple granular spaces. Clustering ensemble with these granular spaces is an effective strategy of multigranulation information fusion. The existing algorithms of clustering ensemble can be categorized into three types: feature-based method, combinatorial method and graph-based method. Given the fact that every type of methods has their own advantages and disadvantages, combining their advantages will obtain better granulation results. Based on this consideration, this paper introduces a Dempster-Shafer evidence theory based clustering ensemble method that combines advantages of combinatorial method and graph-based method. In this strategy, the definition of mass functions considers neighbors of an object using the graph binarization and the final clustering ensemble result is generated by applying the Dempster's combination rule. The form of the Dempster's combination rule makes the algorithm conforming to the pattern of combinatorial method. Experimental results show that the proposed method yields better performance in comparison with other seven clustering ensemble methods conducted on fourteen numerical real-world data sets from the UCI Machine Learning Repository.

[1]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Yiyu Yao,et al.  MGRS: A multi-granulation rough set , 2010, Inf. Sci..

[3]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[4]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[5]  Wei Tang,et al.  Clusterer ensemble , 2006, Knowl. Based Syst..

[6]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[7]  Claudio Carpineto,et al.  Consensus Clustering Based on a New Probabilistic Rand Index with Application to Subtopic Retrieval , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Witold Pedrycz,et al.  Positive approximation: An accelerator for attribute reduction in rough set theory , 2010, Artif. Intell..

[9]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[10]  Mari Ostendorf,et al.  Combining Multiple Clustering Systems , 2004, PKDD.

[11]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.