Spectral clustering ensemble via compositional data clustering

An unsupervised learning algorithm for spectral clustering (SC) ensemble is proposed in this paper. A new consensus function is designed to combine multiple spectral clusterings. The proposed algorithm is suitable to large-scale data (e.g., texture image) and it can solve the sensitivity of scaling parameter of spectral clustering. The random scaling parameter and Nyström approximation are used to generate the individuals of SC for ensemble learning and the generated labels are regarded as the new features for each sample. Hungarian algorithm is used to realign the labels and then a compositional data vector can be found by computing the ratio of each label for data points. The compositional data vectors are mapped into another space via logcontrast transform to solve the ill-posed problem of compositional data and final ensemble result can be achieved by clustering the mapped data. Experimental results on UCI data and texture images show that, by comparison of previous approaches based on hypergraph and mixture model, the proposed algorithm takes the least computation time with the almost identical accuracy, and it avoids the selection of accurate scaling parameter in spectral clustering.

[1]  Joachim M. Buhmann,et al.  Path-Based Clustering for Grouping of Smooth Curves and Texture Segmentation , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[3]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[4]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[5]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[6]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ana L. N. Fred,et al.  Finding Consistent Clusters in Data Partitions , 2001, Multiple Classifier Systems.

[8]  Zhou Zhihua,et al.  Bagging-Based Selective Clusterer Ensemble , 2005 .

[9]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  William F. Punch,et al.  Ensembles of partitions via data resampling , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[13]  Jeffrey Mark Siskind,et al.  Image Segmentation with Ratio Cut , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[15]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[17]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[18]  Fang Liu,et al.  Spectral Clustering Ensemble Applied to SAR Image Segmentation , 2008, IEEE Transactions on Geoscience and Remote Sensing.

[19]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[20]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[21]  Jean-Pierre Barthélemy,et al.  The Median Procedure for Partitions , 1993, Partitioning Data Sets.

[22]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[24]  Ana L. N. Fred,et al.  Analysis of consensus partition in cluster ensemble , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).