Video Shot Boundary Detection using Visual Bag-of-Words

Recently, convergence of techniques used in image analysis and video processing has occurred. Many computation and memory intensive image analysis methods have become available for per frame processing of videos due to increased computing power of desktop computers and efficient implementations on multiple cores and graphical processing units (GPUs). As our main contribution in this work, we solve the problem of shot boundary detection using a popular image analysis (object detection) approach: visual bag-of-words (BoW). The baseline approach for the shot boundary detection has been colour histogram and it is at the core of many top methods, but our BoW method of similar complexity in the terms of parameters clearly outperforms colour histograms. Interestingly, an “AND-combination” of colour and BoW histogram detection is clearly superior indicating that colour and local features provide complimentary information for video analysis.

[1]  Xian-Sheng Hua,et al.  To learn representativeness of video frames , 2005, MULTIMEDIA '05.

[2]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[3]  Paul Over,et al.  Video shot boundary detection: Seven years of TRECVid activity , 2010, Comput. Vis. Image Underst..

[4]  Hugh E. Williams,et al.  Video Cut Detection using Frame Windows , 2005, ACSC.

[5]  Lawrence Carin,et al.  Infinite Hidden Markov Models for Unusual-Event Detection in Video , 2008, IEEE Transactions on Image Processing.

[6]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[7]  Bernt Schiele,et al.  Learning semantic object parts for object categorization , 2008, Image Vis. Comput..

[8]  Changhu Wang,et al.  Spatial-bag-of-features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[10]  Christoph H. Lampert,et al.  Unsupervised Object Discovery: A Comparison , 2010, International Journal of Computer Vision.

[11]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[12]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  Bede Liu,et al.  Temporal segmentation of video using frame and histogram space , 2000, IEEE Transactions on Multimedia.

[14]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Wei Li,et al.  A Divide-And-Rule Scheme For Shot Boundary Detection Based on SIFT , 2010, J. Digit. Content Technol. its Appl..

[16]  Ullas Gargi,et al.  Performance characterization of video-shot-change detection methods , 2000, IEEE Trans. Circuits Syst. Video Technol..

[17]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.