Video similarity detection with video signature clustering

The proliferation of video content on the Web makes similarity detection an indispensable tool in Web data management, searching, and navigation. We have previously proposed a compact representation of video clips, called video signature, for retrieving similar video clips in large databases. In this paper, we propose a new signature clustering algorithm to further improve retrieval performance. The algorithm treats all the signatures as an abstract threshold graph, where the threshold is determined based on local data statistics. Similar clusters are identified as highly connected regions in the graph. This algorithm outperforms simple thresholding and hierarchical clustering techniques in identifying a set of manually-determined similar clusters from a dataset of 46,356 Web video clips. At 95% precision, our algorithm attains 85% recall while simple thresholding and complete-link hierarchical scheme attain 67% and 75% recall respectively. Applying our algorithm to the entire dataset, 6,900 similar clusters are identified, with an average cluster size of 2.81 video clips. The distribution of cluster sizes follows a power-law distribution, which has been shown to describe many Web phenomena.

[1]  Giridharan Iyengar,et al.  Distributional clustering for efficient content-based retrieval of images and video , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[2]  Mohamed Abdel-Mottaleb,et al.  Image browsing using hierarchical clustering , 1999, Proceedings IEEE International Symposium on Computers and Communications (Cat. No.PR00250).

[3]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[4]  Seth Pettie,et al.  An optimal minimum spanning tree algorithm , 2000, JACM.

[5]  Avideh Zakhor,et al.  Estimation of Web video multiplicity , 1999, Electronic Imaging.

[6]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[7]  C. J. van Rijsbergen,et al.  Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .

[8]  Fionn Murtagh,et al.  Comments on 'Parallel Algorithms for Hierarchical Clustering and Cluster Validity' , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Robert M. Haralick,et al.  Graph-theoretic clustering for image grouping and retrieval , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[10]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[11]  Avideh Zakhor,et al.  Efficient video similarity measurement and search , 2000 .