A Computational Focus of Attention Mechanism to Process Shapes Efficiently: Theory

Given the ever increasing bandwidth of the visual sensory information available to autonomous agents and other automatic systems, it is becoming essential to endow them with a sense of what is worthwhile their attention and what can be safely disregarded. This article presents a general mathematical framework to efficiently allocate the available computational resources to process the parts of the input that are relevant to solve a perceptual problem of interest. By solving a perceptual problem we mean to find the hypothesis H (i.e., the state of the world) that maximizes a function L(H), referred to as the evidence, representing how well each hypothesis “explains” the input. However, given the large bandwidth of the sensory input, fully evaluating the evidence for each hypothesis is computationally infeasible (e.g., because it would imply checking a large number of pixels). To address this problem we propose a mathematical framework with two key ingredients. The first one is a Bounding Mechanism (BM) to compute lower and upper bounds of the evidence of a hypothesis, for a given computational budget. These bounds are much cheaper to compute than the evidence itself, can be refined at any time by increasing the budget allocated to a hypothesis, and are frequently sufficient to discard a hypothesis. The second ingredient is a Focus of Attention Mechanism (FoAM) to select which hypothesis’ bounds should be refined next, with the goal of discarding non-optimal hypotheses with the least amount of computation. D. Rother · R. Vidal Johns Hopkins University Tel.: +1-410-516-6736 E-mail: diroth@gmail.com S. Schutz University of Gottingen The proposed framework has the following desirable characteristics: 1) it is very efficient since most hypotheses are discarded with minimal computation; 2) it is parallelizable; 3) it is guaranteed to find the globally optimal hypothesis or hypotheses; and 4) its running time depends on the problem at hand, not on the bandwidth of the input. In order to illustrate the general framework, in this article we instantiate it for the problem of simultaneously estimating the class, pose and a noiseless version of a 2D shape in a 2D image. To do this, we develop a novel theory of semidiscrete shapes that allows us to compute the bounds required by the BM. We believe that the theory presented in this article (i.e., the algorithmic paradigm and the theory of shapes) has multiple potential applications well beyond the application demonstrated in this article.

[1]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[2]  D. Kahneman,et al.  Attention and Effort , 1973 .

[3]  Anthony Yezzi,et al.  Non-rigid 2D-3D pose estimation and 2D image segmentation , 2009, CVPR.

[4]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[5]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[6]  Brian Scassellati,et al.  A Behavioral Analysis of Computational Models of Visual Attention , 2007, International Journal of Computer Vision.

[7]  Alberto Elfes,et al.  Occupancy grids: a probabilistic framework for robot perception and navigation , 1989 .

[8]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[9]  F. Kaplan,et al.  The challenges of joint attention , 2006 .

[10]  Guillermo Sapiro,et al.  O(N) implementation of the fast marching algorithm , 2006, Journal of Computational Physics.

[11]  Henrik I. Christensen,et al.  Computational visual attention systems and their cognitive foundations: A survey , 2010, TAP.

[12]  Michael Potmesil Generating octree models of 3D objects from their silhouettes in a sequence of images , 1987, Comput. Vis. Graph. Image Process..

[13]  King-Sun Fu,et al.  Shape Discrimination Using Fourier Descriptors , 1977, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Luciano da Fontoura Costa,et al.  Shape Classification and Analysis: Theory and Practice , 2009 .

[15]  Hiromichi Fujisawa,et al.  Forty years of research in character and document recognition - an industrial perspective , 2008, Pattern Recognit..

[16]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[17]  William T. Freeman,et al.  On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs , 2001, IEEE Trans. Inf. Theory.

[18]  Donald W. Kahn,et al.  Topology: An introduction to the point-set and algebraic areas , 1975 .

[19]  C. Chabris,et al.  Gorillas in Our Midst: Sustained Inattentional Blindness for Dynamic Events , 1999, Perception.

[20]  K. Mardia,et al.  Statistical Shape Analysis , 1998 .

[21]  Antonio Torralba,et al.  Top-down control of visual attention in object detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[22]  Araceli Sanchis,et al.  A Cognitive Approach to Multimodal Attention , 2009 .

[23]  Remco C. Veltkamp,et al.  Shape matching: similarity measures and algorithms , 2001, Proceedings International Conference on Shape Modeling and Applications.

[24]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[25]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[26]  Aaron F. Bobick,et al.  Multiscale 3D Shape Analysis Using Spherical Wavelets , 2005, MICCAI.

[27]  Howard J. Wilcox,et al.  An Introduction to Lebesgue Integration and Fourier Series , 1978 .

[28]  Donald Geman,et al.  An Active Testing Model for Tracking Roads in Satellite Images , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[30]  Raphael Sznitman,et al.  Active Testing for Face Detection and Localization , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Hanan Samet,et al.  An Overview of Quadtrees, Octrees, and Related Hierarchical Data Structures , 1988 .

[32]  Guillermo Sapiro,et al.  Seeing 3D objects in a single 2D image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  Jake K. Aggarwal,et al.  Identification of 3D objects from multiple silhouettes using quadtrees/octrees , 1985, Comput. Vis. Graph. Image Process..

[34]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[35]  Robert Rönngren,et al.  A comparative study of parallel and sequential priority queue algorithms , 1997, TOMC.

[36]  Nikos Paragios,et al.  Motion-based background subtraction using adaptive kernel density estimation , 2004, CVPR 2004.

[37]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[38]  L. Itti,et al.  A brief and selective history of attention , 2005 .