A New Parallel Hierarchical K-Means Clustering Algorithm for Video Retrieval

The K-means clustering algorithm has been widely adopted to build vocabulary in image retrieval. But, the speed and accuracy of K-means still need to be improved. In the manuscript, we propose a New Parallel Hierarchical K-means Clustering (PHKM) Algorithm for Video Retrieval. The PHKM algorithm improves on the K-means as the following ways. First, the Hellinger kernel is used to replace the Euclidean kernel, which improves the accuracy. Second, the multi-core processors based parallel clustering algorithm is proposed. The experiment results show that the proposed PHKM algorithm is very faster and effective than K-means.

[1]  John R. Kender,et al.  Hierarchical document clustering using local patterns , 2010, Data Mining and Knowledge Discovery.

[2]  George Karypis,et al.  Hierarchical Clustering Algorithms for Document Datasets , 2005, Data Mining and Knowledge Discovery.

[3]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Xiaohua Hu,et al.  Exploiting Wikipedia as external knowledge for document clustering , 2009, KDD.

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  Li Xiao,et al.  A sample-based hierarchical adaptive K-means clustering method for large-scale video retrieval , 2013, Knowl. Based Syst..