Saliency Detection for Semantic Segmentation of Videos

There has been remarkable progress in the field of Semantic segmentation in recent years. Yet, it remains a challenging problem to apply segmentation to the video-based applications. Videos usually involve significantly larger volume of data compared to images. Particularly, a video contains around 30 frames per second. Segmentation of the similar frames unnecessarily adds to the time required for segmentation of complete video. In this paper, we propose a contour detection-based approach for detection of salient frames for faster semantic segmentation of videos. We propose to detect the salient frames of the video and pass only the salient frames through the segmentation block. Then, the segmented labels of the salient frames are mapped to the non-salient frames. The salient frame is defined by the variation in the pixel values of the background subtracted frames. The background subtraction is done using MOG2 background subtractor algorithm for background subtraction in various lighting conditions. We demonstrate the results using the Pytorch model for semantic segmentation of images. We propose to concatenate the semantic segmentation model to our proposed framework. We evaluate our result by comparing the time taken and the mean Intersection over Union (mIoU) for segmentation of the video with and without passing the video input through our proposed framework. We evaluate the results of Saliency Detection Block using Retention and Condensation ratio as the quality metrics.

[1]  Francesco G. B. De Natale,et al.  A segment-based image saliency detection , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Vikrant Bhateja,et al.  Improved decision median filter for video sequences corrupted by impulse noise , 2014, 2014 International Conference on Signal Processing and Integrated Networks (SPIN).

[3]  Vikrant Bhateja,et al.  Optimizing feature selection in video-based recognition using Max-Min Ant System for the online video contextual advertisement user-oriented system , 2017, J. Comput. Sci..

[4]  Yelena Yesha,et al.  Keyframe-based video summarization using Delaunay clustering , 2006, International Journal on Digital Libraries.

[5]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Uma Mudenagudi,et al.  Gaussian Mixture Model for summarization of surveillance videos , 2015, 2015 Fifth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG).

[7]  Matthieu Cord,et al.  VSUMM: An Approach for Automatic Video Summarization and Quantitative Evaluation , 2008, 2008 XXI Brazilian Symposium on Computer Graphics and Image Processing.

[8]  Zhe Wu,et al.  Saliency detection with two-level fully convolutional networks , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[9]  Uma Mudenagudi,et al.  Time driven video summarization using GMM , 2013, 2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG).

[10]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[11]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.