Transfer Clustering Ensemble Selection

Clustering ensemble (CE) takes multiple clustering solutions into consideration in order to effectively improve the accuracy and robustness of the final result. To reduce redundancy as well as noise, a CE selection (CES) step is added to further enhance performance. Quality and diversity are two important metrics of CES. However, most of the CES strategies adopt heuristic selection methods or a threshold parameter setting to achieve tradeoff between quality and diversity. In this paper, we propose a transfer CES (TCES) algorithm which makes use of the relationship between quality and diversity in a source dataset, and transfers it into a target dataset based on three objective functions. Furthermore, a multiobjective self-evolutionary process is designed to optimize these three objective functions. Finally, we construct a transfer CE framework (TCE-TCES) based on TCES to obtain better clustering results. The experimental results on 12 transfer clustering tasks obtained from the 20newsgroups dataset show that TCE-TCES can find a better tradeoff between quality and diversity, as well as obtaining more desirable clustering results.

[1]  Yan Yang,et al.  Selective Clustering Ensemble Based on Covariance , 2013, MCS.

[2]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[3]  Blaise Hanczar,et al.  Ensemble methods for biclustering tasks , 2012, Pattern Recognition.

[4]  Chris H. Q. Ding,et al.  Weighted Consensus Clustering , 2008, SDM.

[5]  Hamid Parvin,et al.  Cluster ensemble selection based on a new cluster stability measure , 2014, Intell. Data Anal..

[6]  Jun Zhang,et al.  Multiobjective Semisupervised Classifier Ensemble , 2019, IEEE Transactions on Cybernetics.

[7]  Daoqiang Zhang,et al.  WoCE: A framework for Clustering Ensemble by Exploiting the Wisdom of Crowds Theory , 2016, IEEE Transactions on Cybernetics.

[8]  Wai Lok Woo,et al.  Region-of-interest extraction in low depth of field images using ensemble clustering and difference of Gaussian approaches , 2013, Pattern Recognit..

[9]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[10]  Jane You,et al.  Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Fan Yang,et al.  Cluster ensemble selection with constraints , 2017, Neurocomputing.

[12]  Pengjiang Qian,et al.  Transfer learning based maximum entropy clustering , 2014, 2014 4th IEEE International Conference on Information Science and Technology.

[13]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[14]  Hamid Parvin,et al.  To improve the quality of cluster ensembles by selecting a subset of base clusters , 2014, J. Exp. Theor. Artif. Intell..

[15]  Jiye Liang,et al.  Clustering ensemble selection for categorical data based on internal validity indices , 2017, Pattern Recognit..

[16]  Chang-Dong Wang,et al.  Locally Weighted Ensemble Clustering , 2016, IEEE Transactions on Cybernetics.

[17]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[18]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[19]  Zhaohong Deng,et al.  Transfer Prototype-Based Fuzzy Clustering , 2014, IEEE Transactions on Fuzzy Systems.

[20]  M. Cugmas,et al.  On comparing partitions , 2015 .

[21]  Hamid Parvin,et al.  A clustering ensemble framework based on elite selection of weighted clusters , 2013, Adv. Data Anal. Classif..

[22]  Jane You,et al.  A New Kind of Nonparametric Test for Statistical Comparison of Multiple Classifiers Over Multiple Datasets , 2017, IEEE Transactions on Cybernetics.

[23]  Pengjiang Qian,et al.  Cluster Prototypes and Fuzzy Memberships Jointly Leveraged Cross-Domain Maximum Entropy Clustering , 2016, IEEE Transactions on Cybernetics.

[24]  Muhammad Yousefnezhad,et al.  Wisdom of Crowds cluster ensemble , 2016, Intell. Data Anal..

[25]  Daoqiang Zhang,et al.  A new selection strategy for selective cluster ensemble based on Diversity and Independency , 2016, Eng. Appl. Artif. Intell..

[26]  Ludmila I. Kuncheva,et al.  Moderate diversity for better cluster ensembles , 2006, Inf. Fusion.

[27]  Anil K. Jain,et al.  Adaptive clustering ensembles , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[28]  Zhiwen Yu,et al.  Knowledge Based Cluster Ensemble for Cancer Discovery From Biomolecular Data , 2011, IEEE Transactions on NanoBioscience.

[29]  Zhiwen Yu,et al.  Hybrid Adaptive Classifier Ensemble , 2015, IEEE Transactions on Cybernetics.

[30]  Jane You,et al.  Distribution-Based Cluster Structure Selection , 2017, IEEE Transactions on Cybernetics.

[31]  Qiang Yang,et al.  Self-taught clustering , 2008, ICML '08.

[32]  Hosein Alizadeh,et al.  Hierarchical cluster ensemble selection , 2015, Eng. Appl. Artif. Intell..

[33]  William F. Punch,et al.  A Comparison of Resampling Methods for Clustering Ensembles , 2004, IC-AI.

[34]  Suranjana Samanta,et al.  Cross-domain clustering performed by transfer of knowledge across domains , 2013, 2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG).

[35]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[36]  William F. Punch,et al.  Effects of resampling method and adaptation on clustering ensemble efficacy , 2011, Artificial Intelligence Review.

[37]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[38]  Ana Margarida de Jesus,et al.  Improving Methods for Single-label Text Categorization , 2007 .

[39]  Jane You,et al.  Semi-Supervised Ensemble Clustering Based on Selected Constraint Projection , 2018, IEEE Transactions on Knowledge and Data Engineering.

[40]  Hareton K. N. Leung,et al.  Hybrid $k$ -Nearest Neighbor Classifier , 2016, IEEE Transactions on Cybernetics.

[41]  Yuchou Chang,et al.  Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm , 2008, Pattern Recognit..

[42]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[44]  Yide Wang,et al.  Progressive Semisupervised Learning of Multiple Classifiers , 2018, IEEE Transactions on Cybernetics.

[45]  Xiaoyi Jiang,et al.  Ensemble clustering by means of clustering embedding in vector spaces , 2014, Pattern Recognit..

[46]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Junjie Wu,et al.  Spectral Ensemble Clustering , 2015, KDD.

[48]  Pengjiang Qian,et al.  Collaborative Fuzzy Clustering From Multiple Weighted Views , 2015, IEEE Transactions on Cybernetics.

[49]  Jane You,et al.  From cluster ensemble to structure ensemble , 2012, Inf. Sci..

[50]  Korris Fu-Lai Chung,et al.  Transfer Spectral Clustering , 2012, ECML/PKDD.

[51]  Xiaoli Z. Fern,et al.  Cluster Ensemble Selection , 2008, Stat. Anal. Data Min..

[52]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[53]  Zhiwen Yu,et al.  Graph-based consensus clustering for class discovery from gene expression data , 2007, Bioinform..

[54]  William F. Punch,et al.  Ensembles of partitions via data resampling , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[55]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[56]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Cluster ensemble selection based on relative validity indexes , 2012, Data Mining and Knowledge Discovery.

[57]  Hamidah Ibrahim,et al.  A Survey: Clustering Ensembles Techniques , 2009 .

[58]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[59]  Chongzhao Han,et al.  Rough set based cluster ensemble selection , 2013, Proceedings of the 16th International Conference on Information Fusion.

[60]  Hamid Parvin,et al.  A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm , 2013, Pattern Analysis and Applications.

[61]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[62]  Daniel Hernández-Lobato,et al.  A Double Pruning Scheme for Boosting Ensembles , 2014, IEEE Transactions on Cybernetics.

[63]  Wei Tang,et al.  Clusterer ensemble , 2006, Knowl. Based Syst..

[64]  Yi Hong,et al.  Resampling-based selective clustering ensembles , 2009, Pattern Recognit. Lett..

[65]  Yunjun Gao,et al.  Hybrid clustering solution selection strategy , 2014, Pattern Recognit..

[66]  Guangfei Yang,et al.  Transfer clustering via constraints generated from topics , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[67]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[68]  Licheng Jiao,et al.  Bagging-based spectral clustering ensemble selection , 2011, Pattern Recognit. Lett..

[69]  Hau-San Wong,et al.  Generalized Adjusted Rand Indices for cluster ensembles , 2012, Pattern Recognit..

[70]  Hamid Parvin,et al.  A comprehensive study of clustering ensemble weighting based on cluster quality and diversity , 2017, Pattern Analysis and Applications.

[71]  Chris H. Q. Ding,et al.  Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[72]  Ludmila I. Kuncheva,et al.  Using diversity in cluster ensembles , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).