Global and Local Sensitivity Guided Key Salient Object Re-augmentation for Video Saliency Detection

The existing still-static deep learning based saliency researches do not consider the weighting and highlighting of extracted features from different layers, all features contribute equally to the final saliency decision-making. Such methods always evenly detect all "potentially significant regions" and unable to highlight the key salient object, resulting in detection failure of dynamic scenes. In this paper, based on the fact that salient areas in videos are relatively small and concentrated, we propose a \textbf{key salient object re-augmentation method (KSORA) using top-down semantic knowledge and bottom-up feature guidance} to improve detection accuracy in video scenes. KSORA includes two sub-modules (WFE and KOS): WFE processes local salient feature selection using bottom-up strategy, while KOS ranks each object in global fashion by top-down statistical knowledge, and chooses the most critical object area for local enhancement. The proposed KSORA can not only strengthen the saliency value of the local key salient object but also ensure global saliency consistency. Results on three benchmark datasets suggest that our model has the capability of improving the detection accuracy on complex scenes. The significant performance of KSORA, with a speed of 17FPS on modern GPUs, has been verified by comparisons with other ten state-of-the-art algorithms.

[1]  Stan Sclaroff,et al.  Exploiting Surroundedness for Saliency Detection: A Boolean Map Approach , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ramesh Raskar,et al.  Learning Gaze Transitions from Depth to Improve Video Saliency Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[4]  Ali Borji,et al.  Revisiting Video Saliency: A Large-Scale Benchmark and a New Model , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Zhe Wang,et al.  Image Saliency Prediction in Transformed Domain: A Deep Complex Neural Network Method , 2019, AAAI.

[6]  Liang Wang,et al.  Mask-Guided Contrastive Attention Model for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Mei Han,et al.  Category-Independent Object-Level Saliency Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Huchuan Lu,et al.  Detect Globally, Refine Locally: A Novel Approach to Saliency Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  James J. Clark,et al.  Going from Image to Video Saliency: Augmenting Image Salience with Dynamic Attentional Push , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Dongbing Gu,et al.  Abrupt motion tracking using a visual saliency embedded particle filter , 2014, Pattern Recognit..

[11]  Nuno Vasconcelos,et al.  How many bits does it take for a stimulus to be salient? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Weisi Lin,et al.  A Video Saliency Detection Model in Compressed Domain , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Yueting Zhuang,et al.  DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection , 2015, IEEE Transactions on Image Processing.

[14]  Radomír Mech,et al.  Minimum Barrier Salient Object Detection at 80 FPS , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Ming-Hsuan Yang,et al.  Top-down visual saliency via joint CRF and dictionary learning , 2012, CVPR.

[16]  Han Wang,et al.  Salient Object Detection With Spatiotemporal Background Priors for Video , 2017, IEEE Transactions on Image Processing.

[17]  Yuan Xie,et al.  Flow Guided Recurrent Neural Encoder for Video Salient Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Noel E. O'Connor,et al.  SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[19]  Wenguan Wang,et al.  Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[20]  Tie Liu,et al.  DeepVS: A Deep Learning Based Video Saliency Prediction Approach , 2018, ECCV.

[21]  Noel E. O'Connor,et al.  Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xiaoyan Sun,et al.  Learning to Detect Video Saliency With HEVC Features , 2017, IEEE Transactions on Image Processing.

[23]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[24]  Mingli Song,et al.  Manifold Ranking-Based Matrix Factorization for Saliency Detection , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Aykut Erdem,et al.  Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction , 2016, IEEE Transactions on Multimedia.

[26]  Huchuan Lu,et al.  Kernelized Subspace Ranking for Saliency Detection , 2016, ECCV.

[27]  Huchuan Lu,et al.  Learning to Detect Salient Objects with Image-Level Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jianmin Jiang,et al.  Foveated convolutional neural networks for video summarization , 2018, Multimedia Tools and Applications.

[29]  Qinghua Hu,et al.  SG-FCN: A Motion and Memory-Based Deep Learning Model for Video Saliency Detection , 2018, IEEE Transactions on Cybernetics.

[30]  Haibin Ling,et al.  Revisiting Video Saliency Prediction in the Deep Learning Era , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Liqing Zhang,et al.  Dynamic visual attention: searching for coding length increments , 2008, NIPS.

[32]  Wei Wu,et al.  End-to-End Flow Correlation Tracking with Spatial-Temporal Attention , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Huchuan Lu,et al.  Ranking Saliency , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Xiaogang Wang,et al.  Person Re-Identification by Saliency Learning , 2014 .

[35]  Zhou Wang,et al.  Video saliency incorporating spatiotemporal cues and uncertainty weighting , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[36]  Victor Leboran,et al.  Dynamic Whitening Saliency , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Shao-Yi Chien,et al.  Real-Time Salient Object Detection with a Minimum Spanning Tree , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Huchuan Lu,et al.  Learning Uncertain Convolutional Features for Accurate Saliency Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Gang Wang,et al.  A Bi-Directional Message Passing Model for Salient Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Seong Joon Oh,et al.  Exploiting Saliency for Object Segmentation from Image Level Labels , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[43]  Junliang Xing,et al.  Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Zulin Wang,et al.  Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM , 2017, ECCV.

[45]  Huchuan Lu,et al.  Salient Object Detection with Recurrent Fully Convolutional Networks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Xuelong Li,et al.  Two-Stage Learning to Predict Human Eye Fixations via SDAEs , 2016, IEEE Transactions on Cybernetics.

[47]  Weidong Cai,et al.  Reversion Correction and Regularized Random Walk Ranking for Saliency Detection , 2018, IEEE Transactions on Image Processing.

[48]  Deepu Rajan,et al.  Salient Motion Detection in Compressed Domain , 2013, IEEE Signal Processing Letters.

[49]  Chu-Song Chen,et al.  Salient object detection via local saliency estimation and global homogeneity refinement , 2014, Pattern Recognit..

[50]  Kate Saenko,et al.  Top-Down Visual Saliency Guided by Captions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Ling Shao,et al.  Video Salient Object Detection via Fully Convolutional Networks , 2017, IEEE Transactions on Image Processing.

[52]  Ling Shao,et al.  Consistent Video Saliency Using Local Gradient Flow Optimization and Global Refinement , 2015, IEEE Transactions on Image Processing.

[53]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[54]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Xiaojun Duan,et al.  Saliency-guided level set model for automatic object segmentation , 2019, Pattern Recognit..

[56]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Fatih Murat Porikli,et al.  Saliency-aware geodesic video object segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Cristian Sminchisescu,et al.  Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Amir Roshan Zamir,et al.  Action Recognition in Realistic Sports Videos , 2014 .

[60]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[61]  Lihi Zelnik-Manor,et al.  Learning Video Saliency from Human Gaze Using Candidate Selection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Rita Cucchiara,et al.  Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model , 2016, IEEE Transactions on Image Processing.

[63]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[64]  Jongwook Choi,et al.  Supervising Neural Attention Models for Video Captioning by Human Gaze Data , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Huchuan Lu,et al.  Hyperfusion-Net: Hyper-densely reflective feature fusion for salient object detection , 2019, Pattern Recognit..