Two Stage Shot Boundary Detection via Feature Fusion and Spatial-Temporal Convolutional Neural Networks

Shot boundary detection is essentially to detect the position of frames where the shot changes. It has been actively studied in video analysis and management for convenience, which becomes a key technique with the rapid proliferation of rich and diverse videos. With respect to the complex characteristics of different shots in varying length and content variation property, in this paper we present a two stage method for shot boundary detection (TSSBD) which distinguishes abrupt shot by fusing color histogram and deep features, and locate gradual shot changes with C3D-based deep analysis. Abrupt shot changes are detected firstly as it occurs between two frames, which divides the complete video into segments containing gradual transitions; Over these video segments, gradual shot change detection is implemented using 3D-convolutional neural network, which classifies clips into specific gradual shot change types; Finally, an effective merging strategy is proposed to locate positions of gradual shot transitions. The experimental analysis illustrates that the proposed progressive method is capable of detecting both abrupt shot transitions and gradual shot transitions accurately.

[1]  Wojciech Matusik,et al.  Large-scale, Fast and Accurate Shot Boundary Detection through Spatio-temporal Convolutional Neural Networks , 2017, ArXiv.

[2]  Lihong Xu,et al.  A novel shot detection algorithm based on clustering , 2010, 2010 2nd International Conference on Education Technology and Computer.

[3]  Takafumi Miyatake,et al.  IMPACT: an interactive natural-motion-picture dedicated multimedia authoring system , 1991, CHI.

[4]  Cheng Cai,et al.  TRECVID2005 Experiments in The Hong Kong Polytechnic University: Shot Boundary Detection Based on a Multi-Step Comparison Scheme , 2005, TRECVID.

[5]  Ramin Zabih,et al.  A feature-based algorithm for detecting and classifying scene breaks , 1995, MULTIMEDIA '95.

[6]  Xinbo Gao,et al.  A Unified Framework for Shot Boundary Detection , 2005, CIS.

[7]  Ioannis Pitas,et al.  Information theory-based shot cut/fade detection and video summarization , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Jianyu Wang,et al.  A Self-adapting Dual-threshold Method for Video Shot Transition Detection , 2008, 2008 IEEE International Conference on Networking, Sensing and Control.

[9]  Shiguo Lian,et al.  Automatic video temporal segmentation based on multiple features , 2011, Soft Comput..

[10]  Feng Hong-cai,et al.  A Shot Boundary Detection Method Based on Color Space , 2010, 2010 International Conference on E-Business and E-Government.

[11]  Rong Xie,et al.  Shot boundary detection using convolutional neural networks , 2016, 2016 Visual Communications and Image Processing (VCIP).

[12]  Bo Zhang,et al.  A Formal Study of Shot Boundary Detection , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  King Ngi Ngan,et al.  High accuracy flashlight scene determination for shot boundary detection , 2003, Signal Process. Image Commun..

[14]  Hou Hai-zhen,et al.  Adaptive shot segmentation method based on histogram frame difference , 2010 .

[15]  James Ze Wang,et al.  Unsupervised Multiresolution Segmentation for Images with Low Depth of Field , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Sudipta Roy,et al.  Video shot boundary detection: A review , 2015, 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT).

[17]  Yong Shi,et al.  Fast Video Shot Boundary Detection Based on SVD and Pattern Matching , 2013, IEEE Transactions on Image Processing.

[18]  Akio Nagasaka,et al.  Automatic Video Indexing and Full-Video Search for Object Appearances , 1991, VDB.

[19]  Nicole Vincent,et al.  Efficient and robust shot change detection , 2007, Journal of Real-Time Image Processing.

[20]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Michael Gygli,et al.  Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks , 2017, 2018 International Conference on Content-Based Multimedia Indexing (CBMI).

[22]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[23]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..