论文信息 - A soft barrier model for predicting human visuomotor behavior in a driving task

A soft barrier model for predicting human visuomotor behavior in a driving task

A soft barrier model for predicting human visuomotor behavior in a driving task Leif Johnson (leif@cs.utexas.edu) Department of Computer Science, University of Texas at Austin, USA Brian Sullivan (brians@ski.org) Smith-Kettlewell Eye Research Institute, San Francisco, CA, USA Mary Hayhoe (mary@mail.cps.utexas.edu) Department of Psychology, University of Texas at Austin, USA Dana Ballard (dana@cs.utexas.edu) Department of Computer Science, University of Texas at Austin, USA such models are inappropriate to address task-based behavior because they do not incorporate information about the state of the agent whose vision is being modeled. An alternative ap- proach is to consider vision as part of a control process where information from the senses is used to guide motor behavior (Butko & Movellan, 2010; Nunez-Varela, Ravindran, & Wy- att, 2012; Senders, 1980; Sprague & Ballard, 2003; Sullivan, Johnson, Ballard, & Hayhoe, 2011). Both stimulus and task- based approaches have led to a variety of formulations con- cerning how eye movements should be selected, e.g., using energy models, information theoretic measures, or measures of reward and uncertainty. In the present work, we focus on how selection of eye movement targets may be controlled in part by task related uncertainty and reward. We present a model of visual processing and control that simultaneously takes into account the reward and uncertainty in multiple tasks associated with a dynamic, noisy driving environment. The model successfully accounts for variations in gaze deployment seen in humans driving in a virtual re- ality driving environment. Additionally, we discuss future research allowed by inversion of the soft barrier model. In- version allows human data to be mapped into parameters in the model space so that it can be understood and compared quantitatively within the model framework. Abstract We present a task-based model of human gaze allocation in a driving environment. When engaged in natural tasks, gaze is predominantly directed towards task relevant objects. In par- ticular in a multi-task scenario such as driving, human drivers must access multiple perceptual cues that can be used for ef- fective control. Our model uses visual task modules that re- quire multiple independent sources of information for control, analogous to human foveation on different task-relevant ob- jects. Building on the framework described by Sprague and Ballard (2003), we use a modular structure to feed informa- tion to a set of PID controllers that drive a simulated car and introduce a barrier model for gaze selection. The softmax bar- rier model uses performance thresholds to represent task im- portance across modules and allows noise to be added to any module to represent task uncertainty. Results from the model compare favorably with human gaze data gathered from sub- jects driving in a virtual environment. Keywords: Visual attention; eye movements. Introduction Humans routinely interact with complex, noisy, dynamic en- vironments to accomplish tasks in the world. For example, while driving a car, a person navigates to a desired desti- nation (e.g., grocery store) while paying attention to differ- ent types of objects in the environment (pedestrians, vehicles, etc.) and obeying traffic laws (speed limit, stop signs, etc.). Humans are able to balance competing task demands while si- multaneously gathering information from the world through a foveated visual system, which must be actively moved to different targets to obtain high-resolution imagery. During the deployment of attention, in particular overt eye movements towards an object, humans are sensitive to bottom-up salience (color, motion, etc.) as well as top-down task priority and the rewards associated with a task (Knudsen, 2007; Wolfe, Butcher, Lee, & Hyle, 2003). In particular when engaged in “natural” tasks, eye movements are largely directed towards task relevant objects (Hayhoe & Ballard, 2005; Land & Hayhoe, 2001). Typically in natural environ- ments, there are multiple task relevant objects spread over space and time that require active visual strategies to prop- erly gather information. While human vision research has often focused on models of visual saliency, i.e., a stimulus based controller of attention (Bruce & Tsotsos, 2009; Itti & Koch, 2001; Zhang, Tong, Marks, Shan, & Cottrell, 2008), Model The model proposed in this paper follows the modular archi- tecture of Sprague and Ballard (2003) by factoring complex behaviors like driving into a set of simple control modules that each focus on a well-defined task—for example, a mod- ule to follow the road and another to avoid oncoming cars. In- tuitively, a module is an abstract black-box controller that can be used alone to guide an agent through a single task. More interestingly, modules can be used together dynamically to engage in multiple ongoing behaviors. While the human vi- sual system is highly parallel, processing and attentional fo- cus are largely biased towards the fovea, meaning humans typically get information in a serial fashion by foveating dif- ferent objects over time. In our model, multiple task modules run concurrently; however, to incorporate the foveation con- straint, only one module at a time actively gains new percep-

Dana H. Ballard | Mary M. Hayhoe | Leif Johnson | Brian T. Sullivan

[1] Dana H. Ballard,et al. Eye Movements for Reward Maximization , 2003, NIPS.

[2] John K. Tsotsos,et al. Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[3] D. Ballard,et al. Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[4] Neil J. Gordon,et al. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[5] Balaraman Ravindran,et al. Gaze Allocation Analysis for a Visually Guided Manipulation Task , 2012, SAB.

[6] Aidan O'Dwyer,et al. Handbook of PI and PID controller tuning rules , 2003 .

[7] C. Koch,et al. Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[8] J. Wolfe,et al. Changing your mind: on the contributions of top-down and bottom-up guidance in visual search for feature singletons. , 2003, Journal of experimental psychology. Human perception and performance.

[9] M. Hayhoe,et al. In what ways do eye movements contribute to everyday activities? , 2001, Vision Research.

[10] D. Ballard,et al. The role of uncertainty and reward on eye movements in a virtual driving task. , 2012, Journal of vision.

[11] B. Tatler,et al. The prominence of behavioural biases in eye guidance , 2009 .

[12] Mary Hayhoe,et al. A modular reinforcement learning model for human visuomotor behavior in a driving task , 2011 .

[13] Tim K Marks,et al. SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[14] R. H. S. Carpenter,et al. Neural computation of log likelihood in control of saccadic eye movements , 1995, Nature.

[15] E. Knudsen. Fundamental components of attention. , 2007, Annual review of neuroscience.

[16] Javier R. Movellan,et al. Detecting contingencies: An infomax approach , 2010, Neural Networks.