Keyframes and Shot Boundaries: The Attributes of Scene Segmentation and Classification

Video analytics of real-life scenario deals with the multimedia data statistics that may be characterized by multimodal features of the video components. Large varieties of low-scale multimodal features of the objects creates many challenging issues for discrimination and analysis. On the other hand occlusion, varied illuminations, and complex environmental conditions highlight the video parsing, a challenging research problem. For the experimental purpose, the vital components of the videos include scenes, shots, keyframes, objects, and background. In this work, we focus on keyframes and shot boundaries for scene segmentation of the sample videos taken from YouTube. Structure Similarity index (SSIM) of the shots is computed from the histograms of LBP and HSV color similarities. Motion similarity and inverse time proximity are added to generate Shot Similarity Graph. Sliding window methods are used for grouping similar shots. The proposed work for scene segmentation is validated on six videos of various semantics characterized by human being and animals. The play of the video ranges from 0.5 to 15 min and total no. of scenes in the videos range from 06 to 33.

[1]  Wei Jiang,et al.  A novel compact yet rich key frame creation method for compressed video summarization , 2017, Multimedia Tools and Applications.

[2]  Ruigang Yang,et al.  Semantic Segmentation of Urban Scenes Using Dense Depth Maps , 2010, ECCV.

[3]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Malay Kumar Kundu,et al.  Video shot boundary detection using multiscale geometric analysis of nsct and least squares support vector machine , 2018, Multimedia Tools and Applications.

[5]  Joon-Min Gil,et al.  A unified scheme of shot boundary detection and anchor shot detection in news video story parsing , 2011, Multimedia Tools and Applications.

[6]  Bhabatosh Chanda,et al.  A Model-Based Shot Boundary Detection Technique Using Frame Transition Parameters , 2012, IEEE Transactions on Multimedia.

[7]  Xiaoxiao Li,et al.  Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Martin K. Purvis,et al.  Wildlife video key-frame extraction based on novelty detection in semantic context , 2011, Multimedia Tools and Applications.

[10]  Aristidis Likas,et al.  Weighted multi-view key-frame extraction , 2016, Pattern Recognit. Lett..

[11]  Chinh T. Dang,et al.  Key frame extraction from consumer videos using epitome , 2012, 2012 19th IEEE International Conference on Image Processing.

[12]  Roberto Cipolla,et al.  Understanding RealWorld Indoor Scenes with Synthetic Data , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Seunghoon Hong,et al.  Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation , 2015, NIPS.

[14]  Fei-Fei Li,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, CVPR.

[15]  Raimondo Schettini,et al.  Erratum to: An innovative algorithm for key frame extraction in video summarization , 2006, Journal of Real-Time Image Processing.

[16]  Murat Kunt,et al.  Spatiotemporal Segmentation Based on Region Merging , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Svetlana Lazebnik,et al.  Superparsing - Scalable Nonparametric Image Parsing with Superpixels , 2010, International Journal of Computer Vision.

[18]  Roberto Cipolla,et al.  SceneNet: Understanding Real World Indoor Scenes With Synthetic Data , 2015, ArXiv.

[19]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Heesoo Myeong,et al.  Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Sudipta Roy,et al.  Shot boundary detection using perceptual and semantic information , 2017, International Journal of Multimedia Information Retrieval.

[24]  Yuncai Liu,et al.  Video scene segmentation and semantic representation using a novel scheme , 2009, Multimedia Tools and Applications.

[25]  Suk-Ju Kang,et al.  Dual-dissimilarity measure-based statistical video cut detection , 2017, Journal of Real-Time Image Processing.

[26]  Serge J. Belongie,et al.  Context based object categorization: A critical survey , 2010, Comput. Vis. Image Underst..

[27]  Shangbo Zhou,et al.  Fuzzy color distribution chart -based shot boundary detection , 2016, Multimedia Tools and Applications.

[28]  Michal Wozniak,et al.  Tensor-Based Shot Boundary Detection in Video Streams , 2017, New Generation Computing.

[29]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Linhui Li,et al.  Traffic Scene Segmentation Based on RGB-D Image and Deep Learning , 2018, IEEE Transactions on Intelligent Transportation Systems.

[33]  Michael Gygli,et al.  Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks , 2017, 2018 International Conference on Content-Based Multimedia Indexing (CBMI).

[34]  Yong Shi,et al.  Fast Video Shot Boundary Detection Based on SVD and Pattern Matching , 2013, IEEE Transactions on Image Processing.