Target-directed attention: Sequential decision-making for gaze planning

It is widely agreed that efficient visual search requires the integration of target-driven top-down information and image-driven bottom-up information. Yet the problem of gaze planning - that is, selecting the next best gaze location given the current observations - remains largely unsolved. We propose a probabilistic system that models the gaze sequence as a finite-horizon Bayesian sequential decision process. Direct policy search is used to reason about the next best gaze locations. The system integrates bottom-up saliency information, top-down target knowledge and additional context information through principled Bayesian priors. This results in proposal gaze locations that depend not only the featural visual saliency, but also on prior knowledge and the spatial likelihood of locating the target. The system has been implemented using state-of- the-art object detectors and evaluated on a real-world dataset by comparing it to gaze sequences proposed by a pure bottom-up saliency-based process and to an object detection approach that analyzes the full image. The target-directed attention system is shown to result in higher object detection precision than both competitors, to attend to more relevant targets than the bottom-up attention system, and to require significantly less computation time than the exhaustive approach.

[1]  Jan M. Maciejowski,et al.  Predictive control : with constraints , 2002 .

[2]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[3]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[4]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Laurent Itti,et al.  An Integrated Model of Top-Down and Bottom-Up Attention for Optimizing Detection Speed , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Danica Kragic,et al.  Strategies for Object Manipulation using Foveal and Peripheral Vision , 2006, Fourth IEEE International Conference on Computer Vision Systems (ICVS'06).

[8]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[9]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[10]  Antonio Torralba,et al.  Statistical Context Priming for Object Detection , 2001, ICCV.

[11]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[12]  Gary R. Bradski,et al.  Peripheral-Foveal Vision for Real-time Object Recognition and Tracking in Video , 2007, IJCAI.

[13]  Pietro Perona,et al.  Selective visual attention enables learning and recognition of multiple objects in cluttered scenes , 2005, Comput. Vis. Image Underst..

[14]  Huan Liu,et al.  Customer Retention via Data Mining , 2000, Artificial Intelligence Review.

[15]  Michael Fink,et al.  The Full Images for Natural Knowledge Caltech Office DB , 2003 .

[16]  Ronald A. Rensink The Dynamic Representation of Scenes , 2000 .

[17]  P. Jones Making Decisions , 1971, Nature.

[18]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[19]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Lucas Paletta,et al.  Active object recognition by view integration and reinforcement learning , 2000, Robotics Auton. Syst..

[21]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[23]  Antonio Torralba,et al.  Top-down control of visual attention in object detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[24]  L. Itti,et al.  Search Goal Tunes Visual Features Optimally , 2007, Neuron.

[25]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[26]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[27]  Christopher M. Brown,et al.  Control of selective perception using bayes nets and decision theory , 1994, International Journal of Computer Vision.

[28]  Tal Arbel,et al.  Efficient Discriminant Viewpoint Selection for Active Bayesian Recognition , 2006, International Journal of Computer Vision.

[29]  Lambert E. Wixson,et al.  Using intermediate objects to improve the efficiency of visual search , 1994, International Journal of Computer Vision.

[30]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[31]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[32]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[33]  Sridhar Mahadevan,et al.  A reinforcement learning model of selective visual attention , 2001, AGENTS '01.

[34]  P. Perona,et al.  Rapid natural scene categorization in the near absence of attention , 2002, Proceedings of the National Academy of Sciences of the United States of America.