SCE: A Manifold Regularized Set-Covering Method for Data Partitioning

Cluster analysis plays a very important role in data analysis. In these years, cluster ensemble, as a cluster analysis tool, has drawn much attention for its robustness, stability, and accuracy. Many efforts have been done to combine different initial clustering results into a single clustering solution with better performance. However, they neglect the structure information of the raw data in performing the cluster ensemble. In this paper, we propose a Structural Cluster Ensemble (SCE) algorithm for data partitioning formulated as a set-covering problem. In particular, we construct a Laplacian regularized objective function to capture the structure information among clusters. Moreover, considering the importance of the discriminative information underlying in the initial clustering results, we add a discriminative constraint into our proposed objective function. Finally, we verify the performance of the SCE algorithm on both synthetic and real data sets. The experimental results show the effectiveness of our proposed method SCE algorithm.

[1]  Dacheng Tao,et al.  Deformed Graph Laplacian for Semisupervised Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[3]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[4]  Kurt Hornik,et al.  Voting-Merging: An Ensemble Method for Clustering , 2001, ICANN.

[5]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[6]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Hui Xiong,et al.  A Theoretic Framework of K-Means-Based Consensus Clustering , 2013, IJCAI.

[8]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[9]  Tommy W. S. Chow,et al.  Topology-Based Clustering Using Polar Self-Organizing Map , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[11]  Michael J. Laszlo,et al.  A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Hamid Parvin,et al.  Cluster ensemble selection based on a new cluster stability measure , 2014, Intell. Data Anal..

[13]  Morteza Analoui,et al.  Solving Cluster Ensemble Problems by Correlation's matrix & GA , 2006, Intelligent Information Processing.

[14]  Andrew Zisserman,et al.  Triangulation Embedding and Democratic Aggregation for Image Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[16]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[18]  Yuchou Chang,et al.  Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm , 2008, Pattern Recognit..

[19]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Xiaoyi Jiang,et al.  Ensemble clustering by means of clustering embedding in vector spaces , 2014, Pattern Recognit..

[21]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[22]  Boris G. Mirkin,et al.  Reinterpreting the Category Utility Function , 2001, Machine Learning.

[23]  Ana L. N. Fred,et al.  Finding Consistent Clusters in Data Partitions , 2001, Multiple Classifier Systems.

[24]  Aluizio F. R. Araújo,et al.  Dimension Selective Self-Organizing Maps With Time-Varying Structure for Subspace and Projected Clustering , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[25]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[26]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[27]  Ioannis T. Christou,et al.  Coordination of Cluster Ensembles via Exact Methods , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Yunming Ye,et al.  Extensions of Kmeans-Type Algorithms: A New Clustering Framework by Integrating Intracluster Compactness and Intercluster Separation , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Dacheng Tao,et al.  Algorithm-Dependent Generalization Bounds for Multi-Task Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Hichem Frigui,et al.  A Robust Competitive Clustering Algorithm With Applications in Computer Vision , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Pierre Hansen,et al.  Variable Neighborhood Search , 2018, Handbook of Heuristics.

[32]  Pierre Hansen,et al.  An Interior Point Algorithm for Minimum Sum-of-Squares Clustering , 1997, SIAM J. Sci. Comput..

[33]  Michael K. Ng,et al.  Dictionary Learning-Based Subspace Structure Identification in Spectral Clustering , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[35]  Joaquín A. Pacheco,et al.  A scatter search approach for the minimum sum-of-squares clustering problem , 2005, Comput. Oper. Res..

[36]  Pierre Hansen,et al.  J-MEANS: a new local search heuristic for minimum sum of squares clustering , 1999, Pattern Recognit..

[37]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[39]  Dacheng Tao,et al.  Multi-View Learning With Incomplete Views , 2015, IEEE Transactions on Image Processing.

[40]  Arindam Banerjee,et al.  Bayesian cluster ensembles , 2009, Stat. Anal. Data Min..

[41]  Joydeep Ghosh,et al.  Cluster Ensembles A Knowledge Reuse Framework for Combining Partitionings , 2002, AAAI/IAAI.

[42]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Wen Wen,et al.  An improved clustering ensemble method based link analysis , 2013, World Wide Web.

[44]  M. Analoui,et al.  Automatic Generation and Optimisation of Reconfigurable Financial Monte-Carlo Simulations , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).

[45]  Ludmila I. Kuncheva,et al.  Using diversity in cluster ensembles , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[46]  Mohamed S. Kamel,et al.  Finding Natural Clusters Using Multi-clusterer Combiner Based on Shared Nearest Neighbors , 2003, Multiple Classifier Systems.

[47]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[48]  Jonghyun Choi,et al.  Multi-Directional Multi-Level Dual-Cross Patterns for Robust Face Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Zhigang Luo,et al.  Manifold Regularized Discriminative Nonnegative Matrix Factorization With Fast Gradient Descent , 2011, IEEE Transactions on Image Processing.

[50]  Yong Shi,et al.  Successive Overrelaxation for Laplacian Support Vector Machine , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[51]  M. Mohammadi,et al.  Clustering Ensembles Using Genetic Algorithm , 2007, 2006 International Workshop on Computer Architecture for Machine Perception and Sensing.

[52]  Feiping Nie,et al.  Discriminative Embedded Clustering: A Framework for Grouping High-Dimensional Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[53]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[54]  Dong Xu,et al.  Dimensionality-Dependent Generalization Bounds for k-Dimensional Coding Schemes , 2016, Neural Computation.

[55]  Dacheng Tao,et al.  Local Rademacher Complexity for Multi-Label Learning , 2014, IEEE Transactions on Image Processing.

[56]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Anil K. Jain,et al.  Adaptive clustering ensembles , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[58]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[59]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[60]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[61]  Hamidah Ibrahim,et al.  A Survey: Clustering Ensembles Techniques , 2009 .

[62]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Dacheng Tao,et al.  Multi-View Intact Space Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.