Combining hierarchical clustering approaches using the PCA method

Abstract In expert systems, data mining methods are algorithms that simulate humans’ problem-solving capabilities. Clustering methods as unsupervised machine learning methods are crucial approaches to categorize similar samples in the same categories. The use of different clustering algorithms to a given dataset produces clusters with different qualities. Hence, many researchers have applied clustering combination methods to reduce the risk of choosing an inappropriate clustering algorithm. In these methods, the outputs of several clustering algorithms are combined. In these research works, the input hierarchical clusterings are transformed to descriptor matrices and their combination is achieved by aggregating their descriptor matrices. In previous works, only element-wise aggregation operators have been used and the relation between the elements of each descriptor matrix has been ignored. However, the value of each element of the descriptor matrix is meaningful in comparison with its other elements. The current study proposes a novel method of combining hierarchical clustering approaches based on principle component analysis (PCA). PCA as an aggregator allows considering all elements of the descriptor matrices. In the proposed approach, basic clusters are made and transformed to descriptor matrices. Then, a final matrix is extracted from the descriptor matrices using PCA. Next, a final dendrogram is constructed from the matrix that is used to summarize the results of the diverse clustering. The experimental results on popular available datasets show the superiority of the clustering accuracy of the proposed method over basic clustering methods such as single, average and centroid linkage and previously combined hierarchical clustering methods. In addition, statistical tests show that the proposed method significantly outperformed hierarchical clustering combination methods with element-wise averaging operators in almost all tested datasets. Several experiments have also been conducted which confirm the robustness of the proposed method for its parameter setting.

[1]  Chris H. Q. Ding,et al.  Hierarchical Ensemble Clustering , 2010, 2010 IEEE International Conference on Data Mining.

[2]  Dmitriy Fradkin,et al.  Experiments with random projections for machine learning , 2003, KDD '03.

[3]  Andreas Stafylopatis,et al.  A clustering method based on boosting , 2004, Pattern Recognit. Lett..

[4]  Seyed Mehdi Vahidipour,et al.  Comparing weighted combination of hierarchical clustering based on Cophenetic measure , 2014, Intell. Data Anal..

[5]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[6]  Kian-Lee Tan,et al.  Fast hierarchical clustering and its validation , 2003, Data Knowl. Eng..

[7]  Abdolreza Mirzaei,et al.  An information theoretic approach to hierarchical clustering combination , 2015, Neurocomputing.

[8]  Abdolreza Mirzaei,et al.  A hierarchical clusterer ensemble method based on boosting theory , 2013, Knowl. Based Syst..

[9]  Abdolreza Mirzaei,et al.  Combining hierarchical clusterings using min-transitive closure , 2008, 2008 19th International Conference on Pattern Recognition.

[10]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[11]  Xiaoli Z. Fern,et al.  Clustering Ensembles Using Ants Algorithm , 2009, IWINAC.

[12]  H. Abdi,et al.  Principal component analysis , 2010 .

[13]  Edie M. Rasmussen,et al.  Clustering Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[14]  Abdolreza Mirzaei,et al.  Optimized aggregation function in hierarchical clustering combination , 2016, Intell. Data Anal..

[15]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[16]  Mohamed S. Kamel,et al.  Clustering ensemble using swarm intelligence , 2003, Proceedings of the 2003 IEEE Swarm Intelligence Symposium. SIS'03 (Cat. No.03EX706).

[17]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[19]  Morteza Jalalat-evakilkandi,et al.  A new hierarchical-clustering combination scheme based on scatter matrices and nearest neighbor criterion , 2010, 2010 5th International Symposium on Telecommunications.

[20]  Abdolreza Mirzaei,et al.  Optimized participation of multiple fusion functions in consensus creation: An evolutionary approach , 2012, The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012).

[21]  Derek Greene,et al.  Ensemble clustering in medical diagnostics , 2004 .

[22]  Abdolreza Mirzaei,et al.  A Novel Hierarchical-Clustering-Combination Scheme Based on Fuzzy-Similarity Relations , 2010, IEEE Transactions on Fuzzy Systems.

[23]  P. Viswanath,et al.  A Fast and Efficient Ensemble Clustering Method , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[24]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[25]  Hamidah Ibrahim,et al.  A Survey: Clustering Ensembles Techniques , 2009 .

[26]  K. Koutroumbas,et al.  On the clustering of foF2 time series corresponding to disturbed ionospheric periods , 2010 .

[27]  Abdolreza Mirzaei,et al.  A novel multi-clustering method for hierarchical clusterings based on boosting , 2011, 2011 19th Iranian Conference on Electrical Engineering.

[28]  Majid Ahmadi,et al.  A new method for hierarchical clustering combination , 2008, Intell. Data Anal..