Fast similarity search and clustering of video sequences on the world-wide-web

We define similar video content as video sequences with almost identical content but possibly compressed at different qualities, reformatted to different sizes and frame-rates, undergone minor editing in either spatial or temporal domain, or summarized into keyframe sequences. Building a search engine to identify such similar content in the World-Wide Web requires: 1) robust video similarity measurements; 2) fast similarity search techniques on large databases; and 3) intuitive organization of search results. In a previous paper, we proposed a randomized technique called the video signature (ViSig) method for video similarity measurement. In this paper, we focus on the remaining two issues by proposing a feature extraction scheme for fast similarity search, and a clustering algorithm for identification of similar clusters. Similar to many other content-based methods, the ViSig method uses high-dimensional feature vectors to represent video. To warrant a fast response time for similarity searches on high dimensional vectors, we propose a novel nonlinear feature extraction scheme on arbitrary metric spaces that combines the triangle inequality with the classical Principal Component Analysis (PCA). We show experimentally that the proposed technique outperforms PCA, Fastmap, Triangle-Inequality Pruning, and Haar wavelet on signature data. To further improve retrieval performance, and provide better organization of similarity search results, we introduce a new graph-theoretical clustering algorithm on large databases of signatures. This algorithm treats all signatures as an abstract threshold graph, where the distance threshold is determined based on local data statistics. Similar clusters are then identified as highly connected regions in the graph. By measuring the retrieval performance against a ground-truth set, we show that our proposed algorithm outperforms simple thresholding, single-link and complete-link hierarchical clustering techniques.

[1]  Linda G. Shapiro,et al.  A Flexible Image Database System for Content-Based Retrieval , 1999, Comput. Vis. Image Underst..

[2]  Avideh Zakhor,et al.  Efficient video similarity measurement and search , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[3]  H. Gabriela,et al.  Cluster-preserving Embedding of Proteins , 1999 .

[4]  Craig Silverstein,et al.  Analysis of a Very Large Altavista Query Log" SRC Technical note #1998-14 , 1998 .

[5]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[6]  Hayit Greenspan,et al.  A Probabilistic Framework for Spatio-Temporal Video Representation & Indexing , 2002, ECCV.

[7]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[8]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[9]  James M. Keller,et al.  Fuzzy Models and Algorithms for Pattern Recognition and Image Processing , 1999 .

[10]  Charles T. Zahn,et al.  and Describing GestaltClusters , 1971 .

[11]  Wolfgang Effelsberg,et al.  VisualGREP: A Systematic Method to Compare and Retrieve Video Sequences , 2004, Multimedia Tools and Applications.

[12]  Nuno Vasconcelos,et al.  On the complexity of probabilistic image retrieval , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[13]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[14]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[15]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[16]  Giridharan Iyengar,et al.  Distributional clustering for efficient content-based retrieval of images and video , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[17]  Milind R. Naphade,et al.  Multimodal pattern matching for audio-visual query and retrieval , 2001, IS&T/SPIE Electronic Imaging.

[18]  Gene H. Golub,et al.  Matrix computations , 1983 .

[19]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[20]  Donald A. Adjeroh,et al.  A distance measure for video sequence similarity matching , 1998, Proceedings International Workshop on Multi-Media Database Management Systems (Cat. No.98TB100249).

[21]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[22]  Avideh Zakhor,et al.  Efficient video similarity measurement with video signature , 2002, Proceedings. International Conference on Image Processing.

[23]  Christos Faloutsos,et al.  Searching Multimedia Databases by Content , 1996, Advances in Database Systems.

[24]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[25]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[26]  Wolfgang Effelsberg,et al.  VisualGREP: a systematic method to compare and retrieve video sequences , 1997, Electronic Imaging.

[27]  Sang Uk Lee,et al.  Efficient video indexing scheme for content-based retrieval , 1999, IEEE Trans. Circuits Syst. Video Technol..

[28]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[29]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[30]  R. Marshall 5. Multidimensional Scaling. 2nd edn. Trevor F. Cox and Michael A. A. Cox, Chapman & Hall/CRC, Boca Raton, London, New York, Washington DC, 2000. No. of pages: xiv + 309. Price: $79.95. ISBN 1‐58488‐094‐5 , 2002 .