Modeling Human Visual Search Performance on Realistic Webpages Using Analytical and Deep Learning Methods

Modeling visual search not only offers an opportunity to predict the usability of an interface before actually testing it on real users but also advances scientific understanding about human behavior. In this work, we first conduct a set of analyses on a large-scale dataset of visual search tasks on realistic webpages. We then present a deep neural network that learns to predict the scannability of webpage content, i.e., how easy it is for a user to find a specific target. Our model leverages both heuristic-based features such as target size and unstructured features such as raw image pixels. This approach allows us to model complex interactions that might be involved in a realistic visual search task, which can not be achieved by traditional analytical models. We analyze the model behavior to offer our insights into how the salience map learned by the model aligns with human intuition.

[1]  J. Wolfe,et al.  Five factors that guide attention in visual search , 2017, Nature Human Behaviour.

[2]  M. Carrasco,et al.  The temporal dynamics of visual search: evidence for parallel processing in feature and conjunction searches. , 1999, Journal of experimental psychology. Human perception and performance.

[3]  Bonnie E. John,et al.  CogTool-Explorer: a model of goal-directed user exploration that considers information layout , 2012, CHI.

[4]  Ali Borji,et al.  Saliency Prediction in the Deep Learning Era: Successes and Limitations , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[6]  S. Treue,et al.  Feature-Based Attention Increases the Selectivity of Population Responses in Primate Visual Cortex , 2004, Current Biology.

[7]  Yang Li,et al.  Predicting Human Performance in Vertical Menu Selection Using Deep Learning , 2018, CHI.

[8]  Wei Xu,et al.  ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering , 2015, ArXiv.

[9]  M. Corbetta,et al.  Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.

[10]  Antti Oulasvirta,et al.  Model of visual search and selection time in linear menus , 2014, CHI.

[11]  David Lindlbauer,et al.  Analyzing visual attention during whole body interaction with public displays , 2015, UbiComp.

[12]  Marisa Carrasco,et al.  Feature-Based Attention Modulates Orientation-Selective Responses in Human Visual Cortex , 2007, Neuron.

[13]  Tom Gedeon,et al.  The Analysis Method of Visual Information Searching in the Human-Computer Interactive Process of Intelligent Control System , 2018 .

[14]  Wai-Tat Fu,et al.  SNIF-ACT: A Cognitive Model of User Navigation on the World Wide Web , 2007, Hum. Comput. Interact..

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Antti Oulasvirta,et al.  The Emergence of Interactive Behavior: A Model of Rational Menu Search , 2015, CHI.

[17]  Stefan Treue,et al.  Feature-based attention influences motion processing gain in macaque visual cortex , 1999, Nature.

[18]  Antti Oulasvirta,et al.  Adaptive feature guidance: Modelling visual search with graphical layouts , 2020, Int. J. Hum. Comput. Stud..

[19]  Yang Li,et al.  Analysis and Modeling of Grid Performance on Touchscreen Mobile Devices , 2018, CHI.

[20]  Jiasen Lu,et al.  Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.

[21]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[22]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Antti Oulasvirta,et al.  Modelling Learning of New Keyboard Layouts , 2017, CHI.

[24]  Kate Saenko,et al.  Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  J E Hoffman,et al.  A two-stage model of visual search , 1979, Perception & psychophysics.

[27]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[28]  Niels Taatgen,et al.  Pre-attentive and attentive vision module , 2013, Cognitive Systems Research.

[29]  Marc Pomplun,et al.  Guidance of eye movements during conjunctive visual search: the distractor-ratio effect. , 2003, Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale.

[30]  Saurabh Singh,et al.  Where to Look: Focus Regions for Visual Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Rynson W. H. Lau,et al.  Task-Driven Webpage Saliency , 2018, ECCV.

[32]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[33]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[34]  Carl Gutwin,et al.  A predictive model of menu performance , 2007, CHI.

[35]  Eileen Kowler Eye movements: The past 25years , 2011, Vision Research.

[36]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[39]  Orit Shaer,et al.  Towards Understanding Collaboration around Interactive Surfaces: Exploring Joint Visual Attention , 2016, UIST.

[40]  Li Zhaoping,et al.  A clash of bottom-up and top-down processes in visual search: the reversed letter effect revisited. , 2011, Journal of experimental psychology. Human perception and performance.

[41]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Antti Oulasvirta,et al.  Individualising Graphical Layouts with Predictive Visual Search Models , 2019, ACM Trans. Interact. Intell. Syst..

[43]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[44]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[45]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[46]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.