Two-Stage Transfer Learning of End-to-End Convolutional Neural Networks for Webpage Saliency Prediction

With the great success of convolutional neural networks (CNN) achieved on various computer vision tasks in recent years, CNN has also been applied in natural image saliency prediction. As a specific visual stimuli, webpages exhibit evident similarities whereas also significant differences from natural image. Consequently, the learned CNN for natural image saliency prediction cannot be directly used to predict webpage saliency. Only a few researches on webpage saliency prediction have been developed till now. In this paper, we propose a simple yet effective scheme of two-stage transfer learning of end-to-end CNN to predict the webpage saliency. In the first stage, the output layer of two typical CNN architectures with instances of AlexNet and VGGNet are reconstructed, and the parameters between the fully connected layers are relearned from a large natural image database for image saliency prediction. In the second stage, the parameters between the fully connected layers are relearned from a scarce webpage database for webpage saliency prediction. In fact, the two-stage transfer learning can be regarded as a task transfer in the first stage and a domain transfer in the second stage, respectively. The experimental results indicate that the proposed two-stage transfer learning of end-to-end CNN can obtain a substantial performance improvement for webpage saliency prediction.

[1]  Meredith Ringel Morris,et al.  What do you see when you're surfing?: using eye tracking to predict salient regions of web pages , 2009, CHI.

[2]  Víctor Leborán,et al.  On the relationship between optical variability, visual saliency, and eye fixations: a computational approach. , 2012, Journal of vision.

[3]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[4]  Christopher M. Masciocchi,et al.  A Saliency Model Predicts Fixations in Web Interfaces , 2010 .

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7]  Matthias Bethge,et al.  Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet , 2014, ICLR.

[8]  Qi Zhao,et al.  Predicting Eye Fixations on Webpage With an Ensemble of Early Features and High-Level Representations from Deep Network , 2015, IEEE Transactions on Multimedia.

[9]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Her Mann Tsai,et al.  Effect of rounded edged dimple arrays on the boundary layer development , 2009, J. Vis..

[11]  Zhe Wu,et al.  Webpage saliency prediction with multi-features fusion , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15]  Taghi M. Khoshgoftaar,et al.  A survey of transfer learning , 2016, Journal of Big Data.

[16]  Michael Dorr,et al.  Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.