A novel intelligent clustering approach for high dimensional data in a big data environment

There are many high dimensional multi-view data for various complex and large-scaled applications in a big data environment. However, traditional clustering algorithms consider all features of data with equal relevance, which is difficult to deal with those high dimensional multi-view data. In order to address this challenge problem, we propose a novel approach named intelligent weighting k-means clustering approach (IWKM), which is based on swarm intelligence and k-means algorithm. Because of the sensitivity to initial clusters centers of k-means, IWKM algorithm utilizes the global search capability of swarm intelligence to find initial clusters centers, the weights of view and feature. Then the weighting k-means approach is applied to determine the clusters of objects with initial clusters centers, the weights of view and feature obtained by swarm intelligence. The character of IWKM is as follows: In the model of clustering, every view and feature have their own weights. The weights will affect object's assigned cluster. The weights of view and feature are calculated by swarm intelligent algorithm; At the same time, the degree of coupling between clusters is also introduced into the model of clustering to enlarge the dissimilarity of clusters. The comprehensive experiments are conducted on three high dimensional multi-view data from machine learning repository. The experimental results are put together with five other state-of-the-art clustering algorithms by the evaluation metrics of Rand index, Jaccard coefficient and Folkes Russel. The experiments reveal that our new approach can generate better clustering results when dealing with high dimensional multi-view data in a big data environment.

[1]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[2]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Qinghua Hu,et al.  Cluster structure preserving unsupervised feature selection for multi-view tasks , 2016, Neurocomputing.

[4]  Bo Liu,et al.  Improved particle swarm optimization combined with chaos , 2005 .

[5]  Xiaochun Cao,et al.  Constrained Multi-View Video Face Clustering , 2015, IEEE Transactions on Image Processing.

[6]  Qian Tao,et al.  A rotary chaotic PSO algorithm for trustworthy scheduling of a grid workflow , 2011, Comput. Oper. Res..

[7]  Ettore Francesco Bompard,et al.  A self-adaptive chaotic particle swarm algorithm for short term hydroelectric system scheduling in deregulated environment , 2005 .

[8]  Eric Eaton,et al.  Multi-view constrained clustering with an incomplete mapping between views , 2012, Knowledge and Information Systems.

[9]  Michael K. Ng,et al.  An optimization algorithm for clustering using weighted dissimilarity measures , 2004, Pattern Recognit..

[10]  Hong Zhou,et al.  Accurate integration of multi-view range images using k-means clustering , 2008, Pattern Recognit..

[11]  Chang-Dong Wang,et al.  Weighted Multi-view Clustering with Feature Selection , 2016, Pattern Recognit..

[12]  Tuo Zhang,et al.  Inferring Group-Wise Consistent Multimodal Brain Networks via Multi-View Spectral Clustering , 2012, IEEE Transactions on Medical Imaging.

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Dimitrios Gunopulos,et al.  Locally adaptive metrics for clustering high dimensional data , 2007, Data Mining and Knowledge Discovery.

[15]  Yunming Ye,et al.  TW-k-means: Automated two-level variable weighting clustering algorithm for multiview data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[17]  Haiyang Li,et al.  Dynamic particle swarm optimization and K-means clustering algorithm for image segmentation , 2015 .

[18]  Shuicheng Yan,et al.  Convex Sparse Spectral Clustering: Single-View to Multi-View , 2015, IEEE Transactions on Image Processing.

[19]  Rongfang Bie,et al.  Clustering by fast search and find of density peaks via heat diffusion , 2016, Neurocomputing.

[20]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[21]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.