论文信息 - ScanGAN360: A Generative Model of Realistic Scanpaths for 360° Images

ScanGAN360: A Generative Model of Realistic Scanpaths for 360° Images

Understanding and modeling the dynamics of human gaze behavior in 360◦ environments is a key challenge in computer vision and virtual reality. Generative adversarial approaches could alleviate this challenge by generating a large number of possible scanpaths for unseen images. Existing methods for scanpath generation, however, do not adequately predict realistic scanpaths for 360◦ images. We present ScanGAN360, a new generative adversarial approach to address this challenging problem. Our network generator is tailored to the specifics of 360◦ images representing immersive environments. Specifically, we accomplish this by leveraging the use of a spherical adaptation of dynamic-time warping as a loss function and proposing a novel parameterization of 360◦ scanpaths. The quality of our scanpaths outperforms competing approaches by a large margin and is almost on par with the human baseline. ScanGAN360 thus allows fast simulation of large numbers of virtual observers, whose behavior mimics real users, enabling a better understanding of gaze behavior and novel applications in virtual scene design.

[1] Diego Gutierrez,et al. Multimodality in VR: A survey , 2021, ACM Computing Surveys.

[2] Zulin Wang,et al. Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Kristen Grauman,et al. Making 360° Video Watchable in 2D: Learning Videography for Click Free Viewing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Hans-Peter Seidel,et al. Saccade landing position prediction for gaze-contingent rendering , 2017, ACM Trans. Graph..

[5] KochChristof,et al. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 1998 .

[6] Michael Dorr,et al. Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[8] Kristen Grauman,et al. Pano2Vid: Automatic Cinematography for Watching 360° Videos , 2017, WICED@Eurographics.

[9] Ke Gu,et al. Prediction of the Influence of Navigation Scan-Path on Perceived Quality of Free-Viewpoint Videos , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[10] Stephen Lin,et al. Semantically-Based Human Scanpath Estimation with HMMs , 2013, 2013 IEEE International Conference on Computer Vision.

[11] Olivier Déforges,et al. Salgan360: Visual Saliency Prediction On 360 Degree Images With Generative Adversarial Networks , 2018, 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[12] Neil D. B. Bruce,et al. On metrics for measuring scanpath similarity , 2020, Behavior Research Methods.

[13] Marco Cuturi,et al. Soft-DTW: a Differentiable Loss Function for Time-Series , 2017, ICML.

[14] Frédo Durand,et al. Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15] Zhenzhong Chen,et al. Human scanpath prediction based on deep convolutional saccadic model , 2020, Neurocomputing.

[16] Noel E. O'Connor,et al. Scanpath and saliency prediction on 360 degree images , 2018, Signal Process. Image Commun..

[17] Gordon Wetzstein,et al. Saliency in VR: How Do People Explore Virtual Environments? , 2016, IEEE Transactions on Visualization and Computer Graphics.

[18] Noel E. O'Connor,et al. SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[19] Christof Koch,et al. Modeling attention to salient proto-objects , 2006, Neural Networks.

[20] Noel E. O'Connor,et al. Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Norman I. Badler,et al. A Review of Eye Gaze in Virtual Agents, Social Robotics and HCI: Behaviour Generation, User Interaction and Perception , 2015, Comput. Graph. Forum.

[22] Gordon Wetzstein,et al. Movie editing and cognitive event segmentation in virtual reality video , 2017, ACM Trans. Graph..

[23] E. Gordon,et al. Face to face: visual scanpath evidence for abnormal processing of facial expressions in social phobia , 2004, Psychiatry Research.

[24] B. Tatler,et al. The prominence of behavioural biases in eye guidance , 2009 .

[25] Esa Rahtu,et al. Stochastic bottom-up fixation prediction and saccade generation , 2013, Image Vis. Comput..

[26] Rynson W. H. Lau,et al. Directing user attention via visual flow on web designs , 2016, ACM Trans. Graph..

[27] Lifeng Sun,et al. A Spherical Convolution Approach for Learning Long Term Viewport Prediction in 360 Immersive Video , 2020, Proceedings of the AAAI Conference on Artificial Intelligence.

[28] Rita Cucchiara,et al. Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model , 2016, IEEE Transactions on Image Processing.

[29] Juan Carlos Niebles,et al. D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Xiongkuo Min,et al. The prediction of head and eye movement for 360 degree images , 2018, Signal Process. Image Commun..

[31] Nicolas Thome,et al. Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models , 2019, NeurIPS.

[32] Noel E. O'Connor,et al. SaltiNet: Scan-Path Prediction on 360 Degree Images Using Saliency Volumes , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[33] B. Masiá,et al. Panoramic convolutions for 360 o single-image saliency prediction , 2020 .

[34] Rafael Monroy,et al. SalNet360: Saliency Maps for omni-directional images with CNN , 2017, Signal Process. Image Commun..

[35] Mikhail Startsev,et al. 360-aware Saliency Estimation with Conventional Image Saliency Predictors , 2018, Signal Process. Image Commun..

[36] S. Drucker,et al. The Role of Eye Gaze in Avatar Mediated Conversational Interfaces , 2000 .

[37] Huchuan Lu,et al. Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38] Andreas Geiger,et al. SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images , 2018, ECCV.

[39] Yao Lu,et al. Learning attention map from images , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40] Noel E. O'Connor,et al. PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks , 2018, ECCV Workshops.

[41] Anh Nguyen,et al. Your Attention is Unique: Detecting 360-Degree Video Saliency in Head-Mounted Display for Head Movement Prediction , 2018, ACM Multimedia.

[42] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43] Meinard Müller,et al. Information retrieval for music and motion , 2007 .

[44] Christof Koch,et al. Learning a saliency map using fixated locations in natural scenes. , 2011, Journal of vision.

[45] Yifan Peng,et al. Exploring the role of gaze behavior and object detection in scene understanding , 2013, Front. Psychol..

[46] Arthur Mensch,et al. Differentiable Divergences Between Time Series , 2020, ArXiv.

[47] Christof Koch,et al. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[48] Matthias Bethge,et al. DeepGaze II: Reading fixations from deep features trained on object recognition , 2016, ArXiv.

[49] Junwei Han,et al. Predicting Human Saccadic Scanpaths Based on Iterative Representation Learning , 2019, IEEE Transactions on Image Processing.

[50] Pingmei Xu,et al. GazeGAN - Unpaired Adversarial Image Generation for Gaze Estimation , 2017, ArXiv.

[51] Patrick Le Callet,et al. A Dataset of Head and Eye Movements for 360 Degree Images , 2017, MMSys.

[52] Zhi Liu,et al. Saccadic model of eye movements for free-viewing condition , 2015, Vision Research.

[53] Antonio Torralba,et al. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[54] Ali Borji,et al. Boosting bottom-up and top-down visual features for saliency estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[55] Wenguan Wang,et al. Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[56] Ali Borji,et al. Salient Object Detection Driven by Fixation Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.