Large-scale, Fast and Accurate Shot Boundary Detection through Spatio-temporal Convolutional Neural Networks

Shot boundary detection (SBD) is an important pre-processing step for video manipulation. Here, each segment of frames is classified as either sharp, gradual or no transition. Current SBD techniques analyze hand-crafted features and attempt to optimize both detection accuracy and processing speed. However, the heavy computations of optical flow prevents this. To achieve this aim, we present an SBD technique based on spatio-temporal Convolutional Neural Networks (CNN). Since current datasets are not large enough to train an accurate SBD CNN, we present a new dataset containing more than 3.5 million frames of sharp and gradual transitions. The transitions are generated synthetically using image compositing models. Our dataset contain additional 70,000 frames of important hard-negative no transitions. We perform the largest evaluation to date for one SBD algorithm, on real and synthetic data, containing more than 4.85 million frames. In comparison to the state of the art, we outperform dissolve gradual detection, generate competitive performance for sharp detections and produce significant improvement in wipes. In addition, we are up to 11 times faster than the state of the art.

[1]  Ali Farhadi,et al.  Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks , 2016, ECCV.

[2]  Rita Cucchiara,et al.  Shot and Scene Detection via Hierarchical Clustering for Re-using Broadcast Video , 2015, CAIP.

[3]  S. Domnic,et al.  Walsh–Hadamard Transform Kernel-Based Feature Vector for Shot Boundary Detection , 2014, IEEE Transactions on Image Processing.

[4]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[5]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[6]  Joni-Kristian Kämäräinen,et al.  Video Shot Boundary Detection using Visual Bag-of-Words , 2013, VISAPP.

[7]  Ioannis Pitas,et al.  Information theory-based shot cut/fade detection and video summarization , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Qing Ling,et al.  D3: Deep Dual-Domain Based Fast Restoration of JPEG-Compressed Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Sudipta Roy,et al.  A Genetic Algorithm and Fuzzy Logic Approach for Video Shot Boundary Detection , 2016, Comput. Intell. Neurosci..

[10]  Nobuyuki Yagi,et al.  Shot Boundary Detection at TRECVID 2007 , 2007, TRECVID.

[11]  Weiqiang Wang,et al.  A robust and efficient shot boundary detection approach based on fisher criterion , 2012, ACM Multimedia.

[12]  Yale Song,et al.  To Click or Not To Click: Automatic Selection of Beautiful Thumbnails from Videos , 2016, CIKM.

[13]  Hans-Peter Seidel,et al.  Modeling and optimizing eye vergence response to stereoscopic cuts , 2014, ACM Trans. Graph..

[14]  Jianping Fan,et al.  Adaptive motion-compensated video coding scheme towards content-based bit rate allocation , 2000, J. Electronic Imaging.

[15]  David C. Gibbon,et al.  AT&T Research at TRECVID 2006 , 2006, TRECVID.

[16]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Sylvain Paris,et al.  SimpleFlow: A Non‐iterative, Sublinear Optical Flow Algorithm , 2012, Comput. Graph. Forum.

[19]  Yong Shi,et al.  Fast Video Shot Boundary Detection Based on SVD and Pattern Matching , 2013, IEEE Transactions on Image Processing.

[20]  Ke Zhang,et al.  Summary Transfer: Exemplar-Based Subset Selection for Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Bhabatosh Chanda,et al.  A Model-Based Shot Boundary Detection Technique Using Frame Transition Parameters , 2012, IEEE Transactions on Multimedia.

[22]  Vasileios Mezaris,et al.  Fast shot segmentation combining global and local visual descriptors , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Paul Over,et al.  Video shot boundary detection: Seven years of TRECVid activity , 2010, Comput. Vis. Image Underst..

[24]  Jun Wu,et al.  Tsinghua University at TRECVID 2004: Shot Boundary Detection and High-Level Feature Extraction , 2004, TRECVID.

[25]  Rita Cucchiara,et al.  A Deep Siamese Network for Scene Detection in Broadcast Videos , 2015, ACM Multimedia.

[26]  Wojciech Matusik,et al.  Gradient-based 2D-to-3D Conversion for Soccer Videos , 2015, ACM Multimedia.

[27]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Wojciech Matusik,et al.  Efficient and Scalable View Generation from a Single Image using Fully Convolutional Networks , 2017, ArXiv.

[29]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[30]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Shiguo Lian,et al.  Automatic video temporal segmentation based on multiple features , 2011, Soft Comput..

[33]  Dan Schonfeld,et al.  Statistical sequential analysis for real-time video scene change detection on compressed multimedia bitstream , 2003, IEEE Trans. Multim..

[34]  Donald A. Adjeroh,et al.  Adaptive Edge-Oriented Shot Boundary Detection , 2009, EURASIP J. Image Video Process..

[35]  Anat Levin,et al.  User Assisted Separation of Reflections from a Single Image Using a Sparsity Prior , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.