ScanGAN360: A Generative Model of Realistic Scanpaths for 360° Images

Understanding and modeling the dynamics of human gaze behavior in 360◦ environments is a key challenge in computer vision and virtual reality. Generative adversarial approaches could alleviate this challenge by generating a large number of possible scanpaths for unseen images. Existing methods for scanpath generation, however, do not adequately predict realistic scanpaths for 360◦ images. We present ScanGAN360, a new generative adversarial approach to address this challenging problem. Our network generator is tailored to the specifics of 360◦ images representing immersive environments. Specifically, we accomplish this by leveraging the use of a spherical adaptation of dynamic-time warping as a loss function and proposing a novel parameterization of 360◦ scanpaths. The quality of our scanpaths outperforms competing approaches by a large margin and is almost on par with the human baseline. ScanGAN360 thus allows fast simulation of large numbers of virtual observers, whose behavior mimics real users, enabling a better understanding of gaze behavior and novel applications in virtual scene design.

[1]  Diego Gutierrez,et al.  Multimodality in VR: A survey , 2021, ACM Computing Surveys.

[2]  Zulin Wang,et al.  Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Kristen Grauman,et al.  Making 360° Video Watchable in 2D: Learning Videography for Click Free Viewing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Hans-Peter Seidel,et al.  Saccade landing position prediction for gaze-contingent rendering , 2017, ACM Trans. Graph..

[5]  KochChristof,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 1998 .

[6]  Michael Dorr,et al.  Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[8]  Kristen Grauman,et al.  Pano2Vid: Automatic Cinematography for Watching 360° Videos , 2017, WICED@Eurographics.

[9]  Ke Gu,et al.  Prediction of the Influence of Navigation Scan-Path on Perceived Quality of Free-Viewpoint Videos , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[10]  Stephen Lin,et al.  Semantically-Based Human Scanpath Estimation with HMMs , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Olivier Déforges,et al.  Salgan360: Visual Saliency Prediction On 360 Degree Images With Generative Adversarial Networks , 2018, 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[12]  Neil D. B. Bruce,et al.  On metrics for measuring scanpath similarity , 2020, Behavior Research Methods.

[13]  Marco Cuturi,et al.  Soft-DTW: a Differentiable Loss Function for Time-Series , 2017, ICML.

[14]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Zhenzhong Chen,et al.  Human scanpath prediction based on deep convolutional saccadic model , 2020, Neurocomputing.

[16]  Noel E. O'Connor,et al.  Scanpath and saliency prediction on 360 degree images , 2018, Signal Process. Image Commun..

[17]  Gordon Wetzstein,et al.  Saliency in VR: How Do People Explore Virtual Environments? , 2016, IEEE Transactions on Visualization and Computer Graphics.

[18]  Noel E. O'Connor,et al.  SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[19]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[20]  Noel E. O'Connor,et al.  Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Norman I. Badler,et al.  A Review of Eye Gaze in Virtual Agents, Social Robotics and HCI: Behaviour Generation, User Interaction and Perception , 2015, Comput. Graph. Forum.

[22]  Gordon Wetzstein,et al.  Movie editing and cognitive event segmentation in virtual reality video , 2017, ACM Trans. Graph..

[23]  E. Gordon,et al.  Face to face: visual scanpath evidence for abnormal processing of facial expressions in social phobia , 2004, Psychiatry Research.

[24]  B. Tatler,et al.  The prominence of behavioural biases in eye guidance , 2009 .

[25]  Esa Rahtu,et al.  Stochastic bottom-up fixation prediction and saccade generation , 2013, Image Vis. Comput..

[26]  Rynson W. H. Lau,et al.  Directing user attention via visual flow on web designs , 2016, ACM Trans. Graph..

[27]  Lifeng Sun,et al.  A Spherical Convolution Approach for Learning Long Term Viewport Prediction in 360 Immersive Video , 2020, Proceedings of the AAAI Conference on Artificial Intelligence.

[28]  Rita Cucchiara,et al.  Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model , 2016, IEEE Transactions on Image Processing.

[29]  Juan Carlos Niebles,et al.  D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Xiongkuo Min,et al.  The prediction of head and eye movement for 360 degree images , 2018, Signal Process. Image Commun..

[31]  Nicolas Thome,et al.  Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models , 2019, NeurIPS.

[32]  Noel E. O'Connor,et al.  SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[33]  B. Masiá,et al.  Panoramic convolutions for 360 o single-image saliency prediction , 2020 .

[34]  Rafael Monroy,et al.  SalNet360: Saliency Maps for omni-directional images with CNN , 2017, Signal Process. Image Commun..

[35]  Mikhail Startsev,et al.  360-aware Saliency Estimation with Conventional Image Saliency Predictors , 2018, Signal Process. Image Commun..

[36]  S. Drucker,et al.  The Role of Eye Gaze in Avatar Mediated Conversational Interfaces , 2000 .

[37]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Andreas Geiger,et al.  SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images , 2018, ECCV.

[39]  Yao Lu,et al.  Learning attention map from images , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Noel E. O'Connor,et al.  PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks , 2018, ECCV Workshops.

[41]  Anh Nguyen,et al.  Your Attention is Unique: Detecting 360-Degree Video Saliency in Head-Mounted Display for Head Movement Prediction , 2018, ACM Multimedia.

[42]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[44]  Christof Koch,et al.  Learning a saliency map using fixated locations in natural scenes. , 2011, Journal of vision.

[45]  Yifan Peng,et al.  Exploring the role of gaze behavior and object detection in scene understanding , 2013, Front. Psychol..

[46]  Arthur Mensch,et al.  Differentiable Divergences Between Time Series , 2020, ArXiv.

[47]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[48]  Matthias Bethge,et al.  DeepGaze II: Reading fixations from deep features trained on object recognition , 2016, ArXiv.

[49]  Junwei Han,et al.  Predicting Human Saccadic Scanpaths Based on Iterative Representation Learning , 2019, IEEE Transactions on Image Processing.

[50]  Pingmei Xu,et al.  GazeGAN - Unpaired Adversarial Image Generation for Gaze Estimation , 2017, ArXiv.

[51]  Patrick Le Callet,et al.  A Dataset of Head and Eye Movements for 360 Degree Images , 2017, MMSys.

[52]  Zhi Liu,et al.  Saccadic model of eye movements for free-viewing condition , 2015, Vision Research.

[53]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[54]  Ali Borji,et al.  Boosting bottom-up and top-down visual features for saliency estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Wenguan Wang,et al.  Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[56]  Ali Borji,et al.  Salient Object Detection Driven by Fixation Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.