An Element Sensitive Saliency Model with Position Prior Learning for Web Pages

Understanding human visual attention is important for multimedia applications. Many studies have attempted to build saliency prediction models on natural images. However, limited efforts have been devoted to saliency prediction for Web pages, which are characterized by diverse content elements and spatial layouts. In this paper, we propose a novel end-to-end deep generative saliency model for Web pages. To capture position biases introduced by page layouts, a Position Prior Learning (PPL) sub-network is proposed, which models the position biases with a variational auto-encoder. To model different elements of a Web page, a Multi Discriminative Region Detection (MDRD) branch and a Text Region Detection (TRD) branch are introduced, which extract discriminative localizations and prominent text regions, respectively. We validate the proposed model with a public Web-page dataset 'FIWI', and show that the proposed model outperforms the state-of-art models for Web-page saliency prediction.

[1]  Chengyao Shen Learning High-Level Concepts by Training A Deep Network on Eye Fixations , 2012 .

[2]  Ali Borji,et al.  Exploiting local and global patch rarities for saliency detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Christopher M. Masciocchi,et al.  A Saliency Model Predicts Fixations in Web Interfaces , 2010 .

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[7]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[8]  Zhe Wu,et al.  Webpage saliency prediction with multi-features fusion , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[9]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[10]  Meredith Ringel Morris,et al.  What do you see when you're surfing?: using eye tracking to predict salient regions of web pages , 2009, CHI.

[11]  Ronald A. Rensink The Dynamic Representation of Scenes , 2000 .

[12]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[13]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  J. Nielsen F-shaped pattern for reading Web content, Jakob Nielsen's Alertbox , 2006 .

[15]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[16]  Christof Koch,et al.  Predicting human gaze using low-level saliency combined with face detection , 2007, NIPS.

[17]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Kai Wang,et al.  Word Spotting in the Wild , 2010, ECCV.

[19]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[20]  Garrison W. Cottrell,et al.  Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21]  Zhaoping Li,et al.  Neural Activities in V1 Create a Bottom-Up Saliency Map , 2012, Neuron.

[22]  Víctor Leborán,et al.  On the relationship between optical variability, visual saliency, and eye fixations: a computational approach. , 2012, Journal of vision.

[23]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[24]  Qi Zhao,et al.  Predicting Eye Fixations on Webpage With an Ensemble of Early Features and High-Level Representations from Deep Network , 2015, IEEE Transactions on Multimedia.

[25]  Rita Cucchiara,et al.  A deep multi-level network for saliency prediction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[26]  Rita Cucchiara,et al.  Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model , 2016, IEEE Transactions on Image Processing.

[27]  Ali Borji,et al.  CAT2000: A Large Scale Fixation Dataset for Boosting Saliency Research , 2015, ArXiv.

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  L. Zhaoping Attention capture by eye of origin singletons even without awareness--a hallmark of a bottom-up saliency map in the primary visual cortex. , 2008, Journal of vision.

[30]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[32]  Qi Zhao,et al.  Webpage Saliency , 2014, ECCV.

[33]  Noel E. O'Connor,et al.  SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[34]  Noel E. O'Connor,et al.  Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Qi Zhao,et al.  Learning to predict eye fixations for semantic contents using multi-layer sparse network , 2014, Neurocomputing.