Fast coarse-to-fine video retrieval using shot-level spatio-temporal statistics

In this paper, we propose a fast coarse-to-fine video retrieval scheme using shot-level spatio-temporal statistics. The scheme consists of a two-step coarse search followed by a fine search. In the coarse search stage, the shot-level motion and color distribution is computed as spatio-temporal features for shot matching. The first-step coarse search uses the shot-level global statistics to reduce the size of the search space drastically. By adding an adjacent shot of the first query shot, the second-step coarse search introduces a "causality" relation between two consecutive shots to improve the search accuracy. Finally, the fine-search step refines the search result by using the local color features extracted from the key frames of the query shots. Our experimental results show that the proposed method achieves good retrieval performance with a much reduced complexity compared to single-pass methods.

[1]  Chia-Wen Lin,et al.  A rate-constrained key-frame extraction scheme for channel-aware video streaming , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[2]  Seong-Dae Kim,et al.  Iterative key frame selection in the rate-constraint environment , 2003, Signal Process. Image Commun..

[3]  Sang Hyun Kim,et al.  An efficient algorithm for video sequence matching using the modified Hausdorff distance and the directed divergence , 2002, IEEE Trans. Circuits Syst. Video Technol..

[4]  Chong-Wah Ngo,et al.  Video clip retrieval by maximal matching and optimal matching in graph theory , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[5]  Milind R. Naphade,et al.  Novel scheme for fast and efficent video sequence matching using compact signatures , 1999, Electronic Imaging.

[6]  Chin-Chuan Han,et al.  Why recognition in a statistics-based face recognition system should be based on the pure face portion: a probabilistic decision-based proof , 2001, Pattern Recognit..

[7]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[8]  Kuo-Chin Fan,et al.  A motion-tolerant dissolve detection algorithm , 2005, IEEE Transactions on Multimedia.

[9]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[10]  HongJiang Zhang,et al.  A new perceived motion based shot content representation , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[11]  Ming-Ting Sun,et al.  Global motion estimation from coarsely sampled motion vector field and the applications , 2003, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Sang Uk Lee,et al.  Efficient video indexing scheme for content-based retrieval , 1999, IEEE Trans. Circuits Syst. Video Technol..

[13]  Tat-Seng Chua,et al.  A match and tiling approach to content-based video retrieval , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[14]  Yueting Zhuang,et al.  A new approach to retrieve video by example video clip , 1999, MULTIMEDIA '99.

[15]  Yueting Zhuang,et al.  Content-based video similarity model , 2000, MM 2000.

[16]  Chih-Wen Su,et al.  A motion-tolerant dissolve detection algorithm , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[17]  Alberto Del Bimbo,et al.  Visual information retrieval , 1999 .

[18]  Avideh Zakhor,et al.  Efficient video similarity measurement with video signature , 2002, Proceedings. International Conference on Image Processing.

[19]  Kuo-Chin Fan,et al.  A motion-flow-based fast video retrieval system , 2005, MIR '05.