A Newly Developed Ground Truth Dataset for Visual Saliency in Videos

Visual saliency models aim to detect important and eye catching portions in a scene by exploiting human visual system characteristics. The effectiveness of visual saliency models is evaluated by comparing saliency maps with a ground truth data set. In recent years, several visual saliency computation algorithms and ground truth data sets have been proposed for images. However, there is lack of ground truth data sets for videos. A new human labeled ground truth is prepared for video sequences that are commonly used in video coding. The selected videos are from different genres including conversational, sports, outdoor, and indoor having low, medium, and high motion. Saliency mask is obtained for each video by nine different subjects, which are asked to label the salient region in each frame in the form of a rectangular bounding box. A majority voting criteria is used to construct a final ground truth saliency mask for each frame. Sixteen different state-of-the-art visual saliency algorithms are selected for comparison and their effectiveness is computed quantitatively on the newly developed ground truth. It is evident from results that multiple kernel learning and spectral residual-based saliency algorithms perform best for different genres and motion-type videos in terms of F-measure and execution time, respectively.

[1]  Philip H. S. Torr,et al.  Salient Object Detection and Segmentation , 2013 .

[2]  Aykut Erdem,et al.  Visual saliency estimation by integrating features using multiple kernel learning , 2013, ArXiv.

[3]  Esa Rahtu,et al.  A Simple and efficient saliency detector for background subtraction , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[4]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[5]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[6]  Thomas Martinetz,et al.  Variability of eye movements when viewing dynamic natural scenes. , 2010, Journal of vision.

[7]  Naila Murray,et al.  Saliency estimation using a non-parametric low-level vision model , 2011, CVPR 2011.

[8]  S. Süsstrunk,et al.  SLIC Superpixels ? , 2010 .

[9]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  James M. Rehg,et al.  The Secrets of Salient Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Jingdong Wang,et al.  Salient Object Detection: A Discriminative Regional Feature Integration Approach , 2013, International Journal of Computer Vision.

[12]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[14]  Weisi Lin,et al.  Saliency-Based Defect Detection in Industrial Images by Using Phase Spectrum , 2014, IEEE Transactions on Industrial Informatics.

[15]  Santanu Chaudhury,et al.  A Scheme for Attentional Video Compression , 2011, PReMI.

[16]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[17]  Sung Wook Baik,et al.  Video Summarization by Employing Visual Saliency in a Sufficient Content Change Method , 2014 .

[18]  Benjamin B. Bederson,et al.  Automatic thumbnail cropping and its effectiveness , 2003, UIST '03.

[19]  Li Xu,et al.  Hierarchical Saliency Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Esa Rahtu,et al.  Fast and Efficient Saliency Detection Using Sparse Sampling and Kernel Density Estimation , 2011, SCIA.

[23]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Mubarak Shah,et al.  Visual attention detection in video sequences using spatiotemporal cues , 2006, MM '06.

[25]  Gert Kootstra,et al.  Predicting Eye Fixations on Complex Visual Stimuli Using Local Symmetry , 2011, Cognitive Computation.

[26]  Weiming Wang,et al.  Saliency-based Adaptive Scaling for Image Retargeting , 2011, 2011 Seventh International Conference on Computational Intelligence and Security.

[27]  King Ngi Ngan,et al.  Face segmentation using skin-color map in videophone applications , 1999, IEEE Trans. Circuits Syst. Video Technol..

[28]  Shi-Min Hu,et al.  SalientShape: group saliency in image collections , 2013, The Visual Computer.

[29]  John K. Tsotsos,et al.  On computational modeling of visual saliency: Examining what’s right, and what’s left , 2015, Vision Research.

[30]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[31]  Matei Mancas,et al.  Dense crowd analysis through bottom-up and top-down attention , 2010 .

[32]  Huchuan Lu,et al.  Inner and Inter Label Propagation: Salient Object Detection in the Wild , 2015, IEEE Transactions on Image Processing.

[33]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[34]  Martin D. Levine,et al.  Saliency Detection Based on Frequency and Spatial Domain Analyses , 2011, BMVC.

[35]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[36]  Nanning Zheng,et al.  Automatic salient object segmentation based on context and shape prior , 2011, BMVC.

[37]  Nuno Vasconcelos,et al.  Decision-Theoretic Saliency: Computational Principles, Biological Plausibility, and Implications for Neurophysiology and Psychophysics , 2009, Neural Computation.

[38]  Henrik I. Christensen,et al.  Computational visual attention systems and their cognitive foundations: A survey , 2010, TAP.

[39]  Nuno Vasconcelos,et al.  Background subtraction in highly dynamic scenes , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41]  Nicolas Riche,et al.  RARE2012: A multi-scale rarity-based saliency detection with its comparative statistical analysis , 2013, Signal Process. Image Commun..

[42]  Huchuan Lu,et al.  Saliency Detection with Multi-Scale Superpixels , 2014, IEEE Signal Processing Letters.

[43]  Nicolas Riche,et al.  Dynamic Saliency Models and Human Attention: A Comparative Study on Videos , 2012, ACCV.

[44]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[45]  Lihi Zelnik-Manor,et al.  Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Shan Li,et al.  Fast Visual Tracking using Motion Saliency in Video , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[47]  K L Shapiro,et al.  Temporary suppression of visual processing in an RSVP task: an attentional blink? . , 1992, Journal of experimental psychology. Human perception and performance.

[48]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[49]  Laurent Itti,et al.  Visual attention guided bit allocation in video compression , 2011, Image Vis. Comput..

[50]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[51]  Raimondo Schettini,et al.  Self-Adaptive Image Cropping for Small Displays , 2007, IEEE Transactions on Consumer Electronics.

[52]  Christof Koch,et al.  Advances in Learning Visual Saliency: From Image Primitives to Semantic Contents , 2014 .

[53]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .