The Prediction of Saliency Map for Head and Eye Movements in 360 Degree Images

By recording the whole scene around the capturer, virtual reality (VR) techniques can provide viewers the sense of presence. To provide a satisfactory quality of experience, there should be at least 60 pixels per degree, so the resolution of panoramas should reach 21600 × 10800. The huge amount of data will put great demands on data processing and transmission. However, when exploring in the virtual environment, viewers only perceive the content in the current field of view (FOV). Therefore if we can predict the head and eye movements which are important behaviors of viewer, more processing resources can be allocated to the active FOV. But conventional saliency prediction methods are not fully adequate for panoramic images. In this paper, a new panorama-oriented model, to predict head and eye movements, is proposed. Due to the superiority of computation in the spherical domain, the spherical harmonics are employed to extract features at different frequency bands and orientations. Related low- and high-level features including the rare components in the frequency domain and color domain, the difference between center vision and peripheral vision, visual equilibrium, person and car detection, and equator bias are extracted to estimate the saliency. To predict head movements, visual mechanisms including visual uncertainty and equilibrium are incorporated, and the graphical model and functional representation for the switch of head orientation are established. Extensive experimental results on the publicly available database demonstrate the effectiveness of our methods.

[1]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[2]  Mikhail Startsev,et al.  360-aware Saliency Estimation with Conventional Image Saliency Predictors , 2018, Signal Process. Image Commun..

[3]  Yves Wiaux,et al.  Directional spin wavelets on the sphere , 2015, ArXiv.

[4]  Zhenzhong Chen,et al.  Video Saliency Prediction Based on Spatial-Temporal Two-Stream Network , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Jeremy N. Bailenson,et al.  A Public Database of Immersive VR Videos with Corresponding Ratings of Arousal, Valence, and Correlations between Head Movements and Self Report Measures , 2017, Front. Psychol..

[6]  Yiming Li,et al.  Recent advances in omnidirectional video coding for virtual reality: Projection and evaluation , 2018, Signal Process..

[7]  Preeti Verghese,et al.  Where to look next? Eye movements reduce local uncertainty. , 2007, Journal of vision.

[8]  Josefin Ohlsson,et al.  Normal visual acuity in 17--18 year olds. , 2004, Acta ophthalmologica Scandinavica.

[9]  J. Dichgans,et al.  Differential effects of central versus peripheral vision on egocentric and exocentric motion perception , 1973, Experimental Brain Research.

[10]  Zulin Wang,et al.  Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Max Welling,et al.  Spherical CNNs , 2018, ICLR.

[12]  Xiaogang Wang,et al.  Content-based photo quality assessment , 2011, 2011 International Conference on Computer Vision.

[13]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[14]  Weisi Lin,et al.  A Universal Framework for Salient Object Detection , 2016, IEEE Transactions on Multimedia.

[15]  Antoine Coutrot,et al.  A dataset of head and eye movements for 360° videos , 2018, MMSys.

[16]  Patrick Le Callet,et al.  Complexity measurement and characterization of 360-degree content , 2019, HVEI.

[17]  Ali Borji,et al.  Computational models: Bottom-up and top-down aspects , 2015, ArXiv.

[18]  Qiang Wu,et al.  A Computational Model for Stereoscopic Visual Saliency Prediction , 2019, IEEE Transactions on Multimedia.

[19]  Ming-Hsuan Yang,et al.  Semantic-driven Generation of Hyperlapse from 360° Video , 2017, ArXiv.

[20]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Xiongkuo Min,et al.  The prediction of head and eye movement for 360 degree images , 2018, Signal Process. Image Commun..

[22]  Zihan Zhou,et al.  Beyond Saliency: Assessing Visual Balance with High-level Cues , 2017, ACM Multimedia.

[23]  Min Sun,et al.  Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[25]  F. Simons,et al.  Localized spectral analysis on the sphere , 2005 .

[26]  Patrick Le Callet,et al.  Toolbox and dataset for the development of saliency and scanpath models for omnidirectional/360° still images , 2018, Signal Process. Image Commun..

[27]  Zhenjiang Miao,et al.  A Saliency Prior Context Model for Real-Time Object Tracking , 2017, IEEE Transactions on Multimedia.

[28]  Leon A. Gatys,et al.  Understanding Low- and High-Level Contributions to Fixation Prediction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Olivier Déforges,et al.  Salgan360: Visual Saliency Prediction On 360 Degree Images With Generative Adversarial Networks , 2018, 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[30]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Patrick Le Callet,et al.  A Dataset of Head and Eye Movements for 360 Degree Images , 2017, MMSys.

[32]  Frédo Durand,et al.  A Benchmark of Computational Models of Saliency to Predict Human Fixations , 2012 .

[33]  Yanwen Guo,et al.  Improving Photo Composition Elegantly: Considering Image Similarity During Composition Optimization , 2012, Comput. Graph. Forum.

[34]  Noel E. O'Connor,et al.  SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[35]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[36]  Gordon Wetzstein,et al.  Saliency in VR: How Do People Explore Virtual Environments? , 2016, IEEE Transactions on Visualization and Computer Graphics.

[37]  Vladyslav Zakharchenko,et al.  Quality metric for spherical panoramic video , 2016, Optical Engineering + Applications.

[38]  Daniel Cohen-Or,et al.  Optimizing Photo Composition , 2010, Comput. Graph. Forum.

[39]  Kristen Grauman,et al.  Flat2Sphere: Learning Spherical Convolution for Fast Features from 360° Imagery , 2017, NIPS 2017.

[40]  Gwendal Simon,et al.  360-Degree Video Head Movement Dataset , 2017, MMSys.

[41]  Ming-Yu Liu,et al.  Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  I. McManus,et al.  Arnheim's Gestalt theory of visual balance: Examining the compositional structure of art photographs and abstract images , 2011, i-Perception.

[43]  Antoine Coutrot,et al.  Introducing UN Salient360! Benchmark: A platform for evaluating visual attention models for 360° contents , 2018, 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX).

[44]  Miska M. Hannuksela,et al.  Efficient Coding of 360-Degree Pseudo-Cylindrical Panoramic Video for Virtual Reality Applications , 2016, 2016 IEEE International Symposium on Multimedia (ISM).

[45]  Ashutosh Singla,et al.  AVtrack360: an open dataset and software recording people's head rotations watching 360° videos on an HMD , 2018, MMSys.

[46]  Alexander Raake,et al.  GBVS360, BMS360, ProSal: Extending existing saliency prediction models from 2D to omnidirectional images , 2018, Signal Process. Image Commun..

[47]  Shenghua Gao,et al.  Gaze Prediction in Dynamic 360° Immersive Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  P. Perona,et al.  Squaring the circle in panoramas , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[49]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[50]  Heinz Hügli,et al.  Assessing the contribution of color in visual attention , 2005, Comput. Vis. Image Underst..

[51]  Steve Kopp,et al.  Understanding Map Projections , 2001 .

[52]  Peter G. J. Barten,et al.  Formula for the contrast sensitivity of the human eye , 2003, IS&T/SPIE Electronic Imaging.

[53]  Jan P. Allebach,et al.  Recommendation system for automatic design of magazine covers , 2013, IUI '13.

[54]  Ziad M. Hafed,et al.  Visual Fixation as Equilibrium: Evidence from Superior Colliculus Inactivation , 2012, The Journal of Neuroscience.

[55]  Rita Cucchiara,et al.  A deep multi-level network for saliency prediction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[56]  Rafael Monroy,et al.  SalNet360: Saliency Maps for omni-directional images with CNN , 2017, Signal Process. Image Commun..

[57]  J. Wolfe,et al.  What attributes guide the deployment of visual attention and how do they do it? , 2004, Nature Reviews Neuroscience.

[58]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[59]  Yu Fang,et al.  Saliency-based gaze prediction based on head direction , 2015, Vision Research.

[60]  Patrick Le Callet,et al.  Which saliency weighting for omni directional image quality assessment? , 2017, 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX).

[61]  Iasonas Kokkinos,et al.  Learning-Based Symmetry Detection in Natural Images , 2012, ECCV.

[62]  Ming-Hsuan Yang,et al.  Semantic-Driven Generation of Hyperlapse from 360 Degree Video , 2018, IEEE Transactions on Visualization and Computer Graphics.

[63]  Federica Battisti,et al.  A feature-based approach for saliency estimation of omni-directional images , 2018, Signal Process. Image Commun..

[64]  Zhenzhong Chen,et al.  A saliency prediction model on 360 degree images using color dictionary based sparse representation , 2018, Signal Process. Image Commun..

[65]  Cagri Ozcinar,et al.  Look around you: Saliency maps for omnidirectional images in VR applications , 2017, 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX).

[66]  Maneesh Agrawala,et al.  Optimizing content-preserving projections for wide-angle images , 2009, SIGGRAPH '09.