Gaze-FTNet: a feature transverse architecture for predicting gaze attention

The dynamics of gaze coordination in natural contexts are affected by various properties of the task, the agent, the environment, and their interaction. Artificial Intelligence (AI) lays the foundation for detection, classification, segmentation, and scene analysis. Much of AI in everyday use is dedicated to predicting people's behavior. However, a purely data-driven approach cannot solve development problems alone. Therefore, it is imperative that decision-makers also consider another AI approach—causal AI, which can help identify the precise relationships of cause and effect. This article presents a novel Gaze Feature Transverse Network (Gaze-FTNet) that generates close-to-human gaze attention. The proposed end-to-end trainable approach leverages a feature transverse network (FTNet) to model long-term dependencies for optimal saliency map prediction. Moreover, several modern backbone architectures are explored, tested, and analyzed. Synthetically predicting human attention from monocular RGB images will benefit several domains, particularly humanvehicle interaction, autonomous driving, and augmented reality.

[1]  Haibin Ling,et al.  Salient Object Detection in the Deep Learning Era: An In-Depth Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Matthias Bethge,et al.  DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Sos S. Agaian,et al.  FTNet: Feature Transverse Network for Thermal Image Semantic Segmentation , 2021, IEEE Access.

[4]  Dietmar Saupe,et al.  TranSalNet: Visual saliency prediction using transformers , 2021, ArXiv.

[5]  Ying Zhu,et al.  Image segmentation evaluation: a survey of methods , 2020, Artificial Intelligence Review.

[6]  Srijith Rajeev,et al.  ISeeColor: Method for Advanced Visual Analytics of Eye Tracking Data , 2020, IEEE Access.

[7]  J. Alison Noble,et al.  Unified Image and Video Saliency Modeling , 2020, ECCV.

[8]  Vineet Gandhi,et al.  Tidying Deep Saliency Prediction Architectures , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Weisi Lin,et al.  A Dilated Inception Network for Visual Saliency Prediction , 2019, IEEE Transactions on Multimedia.

[10]  Rainer Goebel,et al.  Contextual Encoder-Decoder Network for Visual Saliency Prediction , 2019, Neural Networks.

[11]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[12]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[13]  Sa-Kwang Song,et al.  Computer Vision in Precipitation Nowcasting: Applying Image Quality Assessment Metrics for Training Deep Neural Networks , 2019, Atmosphere.

[14]  Ali Borji,et al.  Understanding and Visualizing Deep Visual Saliency Models , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Frédo Durand,et al.  What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Rynson W. H. Lau,et al.  Task-Driven Webpage Saliency , 2018, ECCV.

[17]  Srijith Rajeev,et al.  Fixation oriented object segmentation using mobile eye tracker , 2018, Commercial + Scientific Sensing and Imaging.

[18]  Tianming Liu,et al.  Learning to Predict Eye Fixations via Multiresolution Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Wenguan Wang,et al.  Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[20]  Rita Cucchiara,et al.  Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model , 2016, IEEE Transactions on Image Processing.

[21]  Feiping Nie,et al.  Revisiting Co-Saliency Detection: A Novel Approach Based on Two-Stage Multi-View Spectral Rotation Co-clustering , 2017, IEEE Transactions on Image Processing.

[22]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  R. Venkatesh Babu,et al.  DeepFix: A Fully Convolutional Neural Network for Predicting Human Eye Fixations , 2015, IEEE Transactions on Image Processing.

[24]  Matthias Bethge,et al.  DeepGaze II: Reading fixations from deep features trained on object recognition , 2016, ArXiv.

[25]  Rita Cucchiara,et al.  A deep multi-level network for saliency prediction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[26]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yizhou Yu,et al.  Visual saliency based on multiscale deep features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[33]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Matthias Bethge,et al.  Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet , 2014, ICLR.

[35]  E. Valuations,et al.  A R EVIEW ON E VALUATION M ETRICS F OR D ATA C LASSIFICATION E VALUATIONS , 2015 .

[36]  Michael Dorr,et al.  Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  S. Shalev-Shwartz,et al.  Stochastic Gradient Descent , 2014 .

[38]  Shuo Wang,et al.  Predicting human gaze beyond pixels. , 2014, Journal of vision.

[39]  Scott Krig,et al.  Computer Vision Metrics , 2014, Apress.

[40]  Qi Zhao,et al.  Webpage Saliency , 2014, ECCV.

[41]  Nicolas Riche,et al.  Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics , 2013, 2013 IEEE International Conference on Computer Vision.

[42]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[45]  Huchuan Lu,et al.  Saliency detection based on integration of boundary and soft-segmentation , 2012, 2012 19th IEEE International Conference on Image Processing.

[46]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[47]  Gary E. Birch,et al.  Comparison of Evaluation Metrics in Classification Applications with Imbalanced Datasets , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[48]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[49]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[50]  Patrick Le Callet,et al.  A coherent computational approach to model bottom-up visual attention , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Nuno Vasconcelos,et al.  On the efficient evaluation of probabilistic similarity functions for image retrieval , 2004, IEEE Transactions on Information Theory.

[52]  Robert B. Fisher,et al.  Object-based visual attention for computer vision , 2003, Artif. Intell..

[53]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Yoshua Bengio,et al.  Object Recognition with Gradient-Based Learning , 1999, Shape, Contour and Grouping in Computer Vision.

[55]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[56]  International Journal of Data Mining & Knowledge Management Process , 2022 .