Distribution-Based Cluster Structure Selection

The objective of cluster structure ensemble is to find a unified cluster structure from multiple cluster structures obtained from different datasets. Unfortunately, not all the cluster structures contribute to the unified cluster structure. This paper investigates the problem of how to select the suitable cluster structures in the ensemble which will be summarized to a more representative cluster structure. Specifically, the cluster structure is first represented by a mixture of Gaussian distributions, the parameters of which are estimated using the expectation–maximization algorithm. Then, several distribution-based distance functions are designed to evaluate the similarity between two cluster structures. Based on the similarity comparison results, we propose a new approach, which is referred to as the distribution-based cluster structure ensemble (DCSE) framework, to find the most representative unified cluster structure. We then design a new technique, the distribution-based cluster structure selection strategy (DCSSS), to select a subset of cluster structures. Finally, we propose using a distribution-based normalized hypergraph cut algorithm to generate the final result. In our experiments, a nonparametric test is adopted to evaluate the difference between DCSE and its competitors. We adopt 20 real-world datasets obtained from the University of California, Irvine and knowledge extraction based on evolutionary learning repositories, and a number of cancer gene expression profiles to evaluate the performance of the proposed methods. The experimental results show that: 1) DCSE works well on the real-world datasets and 2) DCSE based on DCSSS can further improve the performance of the algorithm.

[1]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[2]  Y. Rui,et al.  Learning to Rank Using User Clicks and Visual Features for Image Retrieval , 2015, IEEE Transactions on Cybernetics.

[3]  Hareton K. N. Leung,et al.  Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering , 2016, IEEE Transactions on Knowledge and Data Engineering.

[4]  Yi Hong,et al.  Resampling-based selective clustering ensembles , 2009, Pattern Recognit. Lett..

[5]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[6]  Dongdai Lin,et al.  Robust Face Clustering Via Tensor Decomposition , 2015, IEEE Transactions on Cybernetics.

[7]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Xiaoli Z. Fern,et al.  Cluster Ensemble Selection , 2008, Statistical analysis and data mining.

[9]  Lawrence O. Hall,et al.  A scalable framework for cluster ensembles , 2009, Pattern Recognit..

[10]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[11]  Jane You,et al.  From cluster ensemble to structure ensemble , 2012, Inf. Sci..

[12]  Changqing Chen,et al.  Manifold Adaptive Label Propagation for Face Clustering , 2015, IEEE Transactions on Cybernetics.

[13]  Xinlei Chen,et al.  Large Scale Spectral Clustering Via Landmark-Based Sparse Representation , 2015, IEEE Transactions on Cybernetics.

[14]  Zhiwen Yu,et al.  Graph-based consensus clustering for class discovery from gene expression data , 2007, Bioinform..

[15]  Yunjun Gao,et al.  Probabilistic cluster structure ensemble , 2014, Inf. Sci..

[16]  Zhiwen Yu,et al.  Class Discovery From Gene Expression Data Based on Perturbation and Cluster Ensemble , 2009, IEEE Transactions on NanoBioscience.

[17]  Jun Yu,et al.  Click Prediction for Web Image Reranking Using Multimodal Sparse Coding , 2014, IEEE Transactions on Image Processing.

[18]  Yunjun Gao,et al.  Hybrid clustering solution selection strategy , 2014, Pattern Recognit..

[19]  Selim Mimaroglu,et al.  DICLENS: Divisive Clustering Ensemble with Automatic Cluster Number , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[21]  Ponnuthurai N. Suganthan,et al.  Oblique Decision Tree Ensemble via Multisurface Proximal Support Vector Machine , 2015, IEEE Transactions on Cybernetics.

[22]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Xiaoyi Jiang,et al.  Ensemble clustering by means of clustering embedding in vector spaces , 2014, Pattern Recognit..

[24]  Verdine Saviola Noronha Ensemble Clustering for Internet Security Applications , 2013 .

[25]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[26]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[27]  Wai Lok Woo,et al.  Region-of-interest extraction in low depth of field images using ensemble clustering and difference of Gaussian approaches , 2013, Pattern Recognit..

[28]  Bing Li,et al.  Efficient Clustering Aggregation Based on Data Fragments , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[29]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[30]  Ludmila I. Kuncheva,et al.  Moderate diversity for better cluster ensembles , 2006, Inf. Fusion.

[31]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Zhiwen Yu,et al.  Hybrid Adaptive Classifier Ensemble , 2015, IEEE Transactions on Cybernetics.

[33]  Hau-San Wong,et al.  Generalized Adjusted Rand Indices for cluster ensembles , 2012, Pattern Recognit..

[34]  Constantine Kotropoulos,et al.  Speaker Diarization Exploiting the Eigengap Criterion and Cluster Ensembles , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Yi Hong,et al.  Learning Assignment Order of Instances for the Constrained K-Means Clustering Algorithm , 2009, IEEE Trans. Syst. Man Cybern. Part B.

[36]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[37]  Chengqi Zhang,et al.  Graph Ensemble Boosting for Imbalanced Noisy Graph Stream Classification , 2015, IEEE Transactions on Cybernetics.

[38]  Tsaipei Wang,et al.  CA-Tree: A Hierarchical Structure for Efficient and Scalable Coassociation-Based Cluster Ensembles , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[39]  Jane You,et al.  Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  Wei Yuan,et al.  Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization , 2011, Inf. Sci..

[41]  Alexander Schliep,et al.  Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[42]  Jane You,et al.  Hybrid cluster ensemble framework based on the random combination of data transformation operators , 2012, Pattern Recognit..

[43]  Andrea Tagarelli,et al.  Advancing data clustering via projective clustering ensembles , 2011, SIGMOD '11.

[44]  Tossapon Boongoen,et al.  A Link-Based Cluster Ensemble Approach for Categorical Data Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[45]  Han Guoqiang,et al.  SC(3): Triple spectral clustering-based consensus clustering framework for class discovery from cancer gene expression profiles. , 2012, IEEE/ACM transactions on computational biology and bioinformatics.

[46]  Jane You,et al.  SC³: Triple Spectral Clustering-Based Consensus Clustering Framework for Class Discovery from Cancer Gene Expression Profiles , 2012, TCBB.

[47]  Abdolreza Mirzaei,et al.  A Novel Hierarchical-Clustering-Combination Scheme Based on Fuzzy-Similarity Relations , 2010, IEEE Transactions on Fuzzy Systems.

[48]  Zhiwen Yu,et al.  Adaptive noise immune cluster ensemble using affinity propagation , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[49]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[50]  Jane You,et al.  SOM 2 CE: Double Self-Organizing Map Based Cluster Ensemble Framework and its Application in Cancer Gene Expression Profiles , 2012, IEA/AIE.

[51]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[52]  Jacob Goldberger,et al.  Ensemble Segmentation Using Efficient Integer Linear Programming , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Abdolreza Mirzaei,et al.  A hierarchical clusterer ensemble method based on boosting theory , 2013, Knowl. Based Syst..

[54]  Mohamed S. Kamel,et al.  Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Roberto Avogadri,et al.  Fuzzy ensemble clustering based on random projections for DNA microarray data analysis , 2009, Artif. Intell. Medicine.

[56]  Yun Yang,et al.  Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations , 2011, IEEE Transactions on Knowledge and Data Engineering.

[57]  Licheng Jiao,et al.  Bagging-based spectral clustering ensemble selection , 2011, Pattern Recognit. Lett..

[58]  H. Abdi,et al.  Principal component analysis , 2010 .

[59]  Hareton K. N. Leung,et al.  Hybrid $k$ -Nearest Neighbor Classifier , 2016, IEEE Transactions on Cybernetics.

[60]  Fan Yang,et al.  Exploring the diversity in cluster ensemble generation: Random sampling and random projection , 2014, Expert Syst. Appl..

[61]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Yuan Yan Tang,et al.  High-Order Distance-Based Multiview Stochastic Learning in Image Classification , 2014, IEEE Transactions on Cybernetics.

[63]  Hongliang Li,et al.  Unsupervised Multiclass Region Cosegmentation via Ensemble Clustering and Energy Minimization , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[64]  Fang Liu,et al.  Spectral Clustering Ensemble Applied to SAR Image Segmentation , 2008, IEEE Transactions on Geoscience and Remote Sensing.

[65]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[66]  Chongzhao Han,et al.  Rough set based cluster ensemble selection , 2013, Proceedings of the 16th International Conference on Information Fusion.

[67]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Mohamed S. Kamel,et al.  On voting-based consensus of cluster ensembles , 2010, Pattern Recognit..

[69]  Jane You,et al.  NG2CE: Double neural gas based cluster ensemble framework , 2012, 2012 7th International Conference on Computer Science & Education (ICCSE).

[70]  Yun Yang,et al.  Time Series Clustering Via RPCL Network Ensemble With Different Representations , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[71]  Leo Grady,et al.  Random Walks for Image Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Zhiwen Yu,et al.  Knowledge Based Cluster Ensemble for Cancer Discovery From Biomolecular Data , 2011, IEEE Transactions on NanoBioscience.

[73]  Hai Jin,et al.  Content-Based Visual Landmark Search via Multimodal Hypergraph Learning , 2015, IEEE Transactions on Cybernetics.

[74]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[75]  Jane You,et al.  Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.