Saliency Prediction on Omnidirectional Image With Generative Adversarial Imitation Learning

When watching omnidirectional images (ODIs), subjects can access different viewports by moving their heads. Therefore, it is necessary to predict subjects’ head fixations on ODIs. Inspired by generative adversarial imitation learning (GAIL), this paper proposes a novel approach to predict saliency of head fixations on ODIs, named SalGAIL. First, we establish a dataset for attention on ODIs (AOI). In contrast to traditional datasets, our AOI dataset is large-scale, which contains the head fixations of 30 subjects viewing 600 ODIs. Next, we mine our AOI dataset and discover three findings: (1) the consistency of head fixations are consistent among subjects, and it grows alongside the increased subject number; (2) the head fixations exist with a front center bias (FCB); and (3) the magnitude of head movement is similar across the subjects. According to these findings, our SalGAIL approach applies deep reinforcement learning (DRL) to predict the head fixations of one subject, in which GAIL learns the reward of DRL, rather than the traditional human-designed reward. Then, multi-stream DRL is developed to yield the head fixations of different subjects, and the saliency map of an ODI is generated via convoluting predicted head fixations. Finally, experiments validate the effectiveness of our approach in predicting saliency maps of ODIs, significantly better than 11 state-of-the-art approaches. Our AOI dataset and code of SalGAIL are available online at https://github.com/yanglixiaoshen/SalGAIL.

[1]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[2]  Ming-Hsuan Yang,et al.  Top-down visual saliency via joint CRF and dictionary learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[4]  Deva Ramanan,et al.  Attentional Pooling for Action Recognition , 2017, NIPS.

[5]  Stefano Ermon,et al.  InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[6]  Gordon Wetzstein,et al.  Saliency in VR: How Do People Explore Virtual Environments? , 2016, IEEE Transactions on Visualization and Computer Graphics.

[7]  Noel E. O'Connor,et al.  SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[8]  Patrick Le Callet,et al.  A Dataset of Head and Eye Movements for 360 Degree Images , 2017, MMSys.

[9]  Marcus A. Magnor,et al.  Gaze-Contingent Computational Displays: Boosting perceptual fidelity , 2016, IEEE Signal Processing Magazine.

[10]  Touradj Ebrahimi,et al.  A simple method to obtain visual attention data in head mounted virtual reality , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[11]  Gwendal Simon,et al.  360-Degree Video Head Movement Dataset , 2017, MMSys.

[12]  Rafael Monroy,et al.  SalNet360: Saliency Maps for omni-directional images with CNN , 2017, Signal Process. Image Commun..

[13]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[14]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[15]  Rita Cucchiara,et al.  A deep multi-level network for saliency prediction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[16]  Cheng-Hsin Hsu,et al.  360° Video Viewing Dataset in Head-Mounted Virtual Reality , 2017, MMSys.

[17]  Bernd Girod,et al.  A Framework to Evaluate Omnidirectional Video Coding Schemes , 2015, 2015 IEEE International Symposium on Mixed and Augmented Reality.

[18]  Leon A. Gatys,et al.  Understanding Low- and High-Level Contributions to Fixation Prediction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Xiongkuo Min,et al.  The prediction of head and eye movement for 360 degree images , 2018, Signal Process. Image Commun..

[20]  Lixin Fan,et al.  Object Detection in Equirectangular Panorama , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[21]  Kate Saenko,et al.  Top-Down Visual Saliency Guided by Captions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Zulin Wang,et al.  Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Shenghua Gao,et al.  Gaze Prediction in Dynamic 360° Immersive Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Alexander Raake,et al.  GBVS360, BMS360, ProSal: Extending existing saliency prediction models from 2D to omnidirectional images , 2018, Signal Process. Image Commun..

[27]  R. Venkatesh Babu,et al.  DeepFix: A Fully Convolutional Neural Network for Predicting Human Eye Fixations , 2015, IEEE Transactions on Image Processing.

[28]  Joseph H. Goldberg,et al.  Identifying fixations and saccades in eye-tracking protocols , 2000, ETRA.

[29]  M. Iwasaki,et al.  Relation between superficial capillaries and foveal structures in the human retina. , 1986, Investigative ophthalmology & visual science.

[30]  Ernst Niebur,et al.  Head movements during visual exploration of natural images in virtual reality , 2017, 2017 51st Annual Conference on Information Sciences and Systems (CISS).

[31]  Stan Sclaroff,et al.  Exploiting Surroundedness for Saliency Detection: A Boolean Map Approach , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Noel E. O'Connor,et al.  SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[33]  Ming-Yu Liu,et al.  Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Zhenhua Li,et al.  A Measurement Study of Oculus 360 Degree Video Streaming , 2017, MMSys.

[35]  Mikhail Startsev,et al.  360-aware Saliency Estimation with Conventional Image Saliency Predictors , 2018, Signal Process. Image Commun..

[36]  Chen Li,et al.  Bridge the Gap Between VQA and Human Behavior on Omnidirectional Video: A Large-Scale Dataset and a Deep Learning Model , 2018, ACM Multimedia.

[37]  Federica Battisti,et al.  A feature-based approach for saliency estimation of omni-directional images , 2018, Signal Process. Image Commun..

[38]  Matthew H Tong,et al.  SUN: Top-down saliency using natural statistics , 2009, Visual cognition.

[39]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[40]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[41]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[43]  Lihi Zelnik-Manor,et al.  Context-aware saliency detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Antoine Coutrot,et al.  A dataset of head and eye movements for 360° videos , 2018, MMSys.

[45]  Zhenzhong Chen,et al.  A saliency prediction model on 360 degree images using color dictionary based sparse representation , 2018, Signal Process. Image Commun..

[46]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[47]  Ruslan Salakhutdinov,et al.  Action Recognition using Visual Attention , 2015, NIPS 2015.

[48]  Thomas Schierl,et al.  Video processing for panoramic streaming using HEVC and its scalable extensions , 2016, Multimedia Tools and Applications.

[49]  Aljoscha Smolic,et al.  Visual Attention in Omnidirectional Video for Virtual Reality Applications , 2018, 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX).

[50]  Cagri Ozcinar,et al.  Look around you: Saliency maps for omnidirectional images in VR applications , 2017, 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX).

[51]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[52]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[53]  Alan C. Bovik,et al.  GAFFE: A Gaze-Attentive Fixation Finding Engine , 2008, IEEE Transactions on Image Processing.

[54]  Noel E. O'Connor,et al.  Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[56]  Michael Riegler,et al.  Tiling in Interactive Panoramic Video: Approaches and Evaluation , 2016, IEEE Transactions on Multimedia.

[57]  Ali Borji,et al.  Boosting bottom-up and top-down visual features for saliency estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Wenguan Wang,et al.  Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[59]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[60]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[61]  Shenghua Gao,et al.  Saliency Detection in 360 ◦ Videos , 2022 .