Shot boundary detection using scale invariant feature matching

This paper presents a shot boundary detection (SBD) method that finds boundaries between shots using the changes in visual content elements such as objects, actors, and background. Our work presented in this paper is based on the property that the features do not change significantly within a shot whereas they change substantially across a shot boundary. Noticing this characteristic of shot boundaries, we propose a SBD algorithm using the scale- and rotationinvariant local image descriptors. To obtain information of the content elements, we employ the scale invariant feature transform (SIFT) that has been commonly used in object recognition. The number of matched points is large within the same shot whereas zero or the small number of matched points is detected at the shot boundary because all the elements in the previous shot change abruptly in the next shot. Thus we can determine the existence of shot boundaries by the number of matched points. We identify two types of shot boundaries (hard-cut and gradual-transition such as tiling, panning, and fade in/out) with a adjustable frame distance between consecutive frames. Experimental results with four test videos show the effectiveness of the proposed SBD algorithm using scale invariant feature matching.

[1]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..

[2]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  HongJiang Zhang,et al.  Motion texture: a new motion based video representation , 2002, Object recognition supported by user interaction for service robots.

[5]  HongJiang Zhang,et al.  A novel motion-based representation for video mining , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[6]  Chong-Wah Ngo,et al.  Motion analysis and segmentation through spatio-temporal slices processing , 2003, IEEE Trans. Image Process..

[7]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[8]  Mubarak Shah,et al.  Scene detection in Hollywood movies and TV shows , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[9]  Andrew Zisserman,et al.  Video data mining using configurations of viewpoint invariant regions , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10]  Andrew Zisserman,et al.  Object Level Grouping for Video Shots , 2004, International Journal of Computer Vision.

[11]  Andrew Zisserman,et al.  Person Spotting: Video Shot Retrieval for Face Sets , 2005, CIVR.