论文信息 - SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes

SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes

We introduce SaltiNet, a deep neural network for scan-path prediction trained on 360-degree images. The model is based on a temporal-aware novel representation of saliency information named the saliency volume. The first part of the network consists of a model trained to generate saliency volumes, whose parameters are fit by back-propagation computed from a binary cross entropy (BCE) loss over downsampled versions of the saliency volumes. Sampling strategies over these volumes are used to generate scan-paths over the 360-degree images. Our experiments show the advantages of using saliency volumes, and how they can be used for related tasks. Our source code and trained models available at https://github.com/massens/saliency-360salient-2017.

[1] Noel E. O'Connor,et al. SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[2] Peter König,et al. An extensive dataset of eye movements during viewing of complex images , 2017, Scientific Data.

[3] Noel E. O'Connor,et al. Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Qi Zhao,et al. SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5] Yann LeCun,et al. Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[6] Ming-Yu Liu,et al. Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Marcus Nyström,et al. A vector-based, multidimensional scanpath similarity measure , 2010, ETRA.

[8] Frédo Durand,et al. Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[10] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[11] Matthias Bethge,et al. Information-theoretic model comparison unifies saliency metrics , 2015, Proceedings of the National Academy of Sciences.

[12] Wojciech Matusik,et al. Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Junwei Han,et al. A Deep Spatial Contextual Long-Term Recurrent Convolutional Network for Saliency Detection , 2016, IEEE Transactions on Image Processing.

[14] Christof Koch,et al. Predicting human gaze using low-level saliency combined with face detection , 2007, NIPS.

[15] Matthias Bethge,et al. Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet , 2014, ICLR.

[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[18] Rita Cucchiara,et al. Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model , 2016, IEEE Transactions on Image Processing.

[19] Pietro Perona,et al. Graph-Based Visual Saliency , 2006, NIPS.

[20] Frédo Durand,et al. Where Should Saliency Models Look Next? , 2016, ECCV.

[21] Nicolas Riche,et al. Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics , 2013, 2013 IEEE International Conference on Computer Vision.

[22] Qi Zhao,et al. SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Patrick Le Callet,et al. A Dataset of Head and Eye Movements for 360 Degree Images , 2017, MMSys.

[24] Pingmei Xu,et al. TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking , 2015, ArXiv.

[25] Ali Borji,et al. Boosting bottom-up and top-down visual features for saliency estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Christof Koch,et al. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[27] Matthias Bethge,et al. DeepGaze II: Reading fixations from deep features trained on object recognition , 2016, ArXiv.

[28] Antonio Torralba,et al. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[29] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30] Thierry Baccino,et al. Methods for comparing scanpaths and saliency maps: strengths and weaknesses , 2012, Behavior Research Methods.