Biased Competition in Visual Processing Hierarchies: A Learning Approach Using Multiple Cues

In this contribution, we present a large-scale hierarchical system for object detection fusing bottom-up (signal-driven) processing results with top-down (model or task-driven) attentional modulation. Specifically, we focus on the question of how the autonomous learning of invariant models can be embedded into a performing system and how such models can be used to define object-specific attentional modulation signals. Our system implements bi-directional data flow in a processing hierarchy. The bottom-up data flow proceeds from a preprocessing level to the hypothesis level where object hypotheses created by exhaustive object detection algorithms are represented in a roughly retinotopic way. A competitive selection mechanism is used to determine the most confident hypotheses, which are used on the system level to train multimodal models that link object identity to invariant hypothesis properties. The top-down data flow originates at the system level, where the trained multimodal models are used to obtain space- and feature-based attentional modulation signals, providing biases for the competitive selection process at the hypothesis level. This results in object-specific hypothesis facilitation/suppression in certain image regions which we show to be applicable to different object detection mechanisms. In order to demonstrate the benefits of this approach, we apply the system to the detection of cars in a variety of challenging traffic videos. Evaluating our approach on a publicly available dataset containing approximately 3,500 annotated video images from more than 1 h of driving, we can show strong increases in performance and generalization when compared to object detection in isolation. Furthermore, we compare our results to a late hypothesis rejection approach, showing that early coupling of top-down and bottom-up information is a favorable approach especially when processing resources are constrained.

[1]  Alexandre Pouget,et al.  Probabilistic Interpretation of Population Codes , 1996, Neural Computation.

[2]  E. Rolls,et al.  A Neurodynamical cortical model of visual attention and invariant object recognition , 2004, Vision Research.

[3]  L. Itti,et al.  Modeling the influence of task on attention , 2005, Vision Research.

[4]  F. Hamker A dynamic model of how feature cues guide spatial attention , 2004, Vision Research.

[5]  Chi-Hung Juan,et al.  Feedback to V1: a reverse hierarchy in vision , 2003, Experimental Brain Research.

[6]  Charless C. Fowlkes,et al.  Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Heiko Wersing,et al.  Learning Optimized Features for Hierarchical Models of Invariant Object Recognition , 2003, Neural Computation.

[8]  D. Knill,et al.  The Bayesian brain: the role of uncertainty in neural coding and computation , 2004, Trends in Neurosciences.

[9]  Keiji Tanaka Mechanisms of visual object recognition: monkey and human studies , 1997, Current Opinion in Neurobiology.

[10]  Jannik Fritsch,et al.  Cross-module learnin ga s a first step towards a cognitive system concept , 2008 .

[11]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[12]  Antonio Torralba,et al.  Object Detection and Localization Using Local and Global Features , 2006, Toward Category-Level Object Recognition.

[13]  J. Gallant,et al.  Time Course of Attention Reveals Different Mechanisms for Spatial and Feature-Based Attention in Area V4 , 2005, Neuron.

[14]  Leslie G. Ungerleider,et al.  Mechanisms of visual attention in the human cortex. , 2000, Annual review of neuroscience.

[15]  S. Treue Neural correlates of attention in primate visual cortex , 2001, Trends in Neurosciences.

[16]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[17]  Jannik Fritsch,et al.  A biologically-inspired vision architecture for resource-constrained intelligent vehicles , 2010, Comput. Vis. Image Underst..

[18]  Christof Koch,et al.  Attentional Selection for Object Recognition - A Gentle Way , 2002, Biologically Motivated Computer Vision.

[19]  Paul A. Viola,et al.  Unsupervised improvement of visual detectors using cotraining , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[21]  Fred H Hamker,et al.  Modeling feature-based attention as an active top-down inference process. , 2006, Bio Systems.

[22]  R. Desimone,et al.  Competitive Mechanisms Subserve Attention in Macaque Areas V2 and V4 , 1999, The Journal of Neuroscience.

[23]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[24]  Nando de Freitas,et al.  Target-directed attention: Sequential decision-making for gaze planning , 2008, 2008 IEEE International Conference on Robotics and Automation.

[25]  R. Zemel,et al.  Inference and computation with population codes. , 2003, Annual review of neuroscience.

[26]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[27]  Kevin P. Murphy,et al.  A non-myopic approach to visual search , 2007, Fourth Canadian Conference on Computer and Robot Vision (CRV '07).

[28]  John H. R. Maunsell,et al.  Attention to both space and feature modulates neuronal responses in macaque area V4. , 2000, Journal of neurophysiology.

[29]  S. Hochstein,et al.  View from the Top Hierarchies and Reverse Hierarchies in the Visual System , 2002, Neuron.

[30]  A. Leonardis,et al.  Object Detection with Bootstrapped Learning ∗ , 2005 .

[31]  John K. Tsotsos,et al.  Selective Tuning: Feature Binding Through Selective Attention , 2006, ICANN.

[32]  John K. Tsotsos,et al.  Attending to visual motion , 2005, Comput. Vis. Image Underst..

[33]  Luc Van Gool,et al.  Dynamic 3D Scene Analysis from a Moving Vehicle , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  Simone Frintrop,et al.  Goal-Directed Search with a Top-Down Modulated Computational Attention System , 2005, DAGM-Symposium.

[36]  Christof Koch,et al.  Visual attention and target detection in cluttered natural scenes , 2001 .

[37]  Cordelia Schmid,et al.  Toward Category-Level Object Recognition (Lecture Notes in Computer Science) , 2007 .

[38]  G. Mangun,et al.  The neural mechanisms of top-down attentional control , 2000, Nature Neuroscience.

[39]  Edgar Körner,et al.  Online Learning for Bootstrapping of Object Recognition and Localization in a Biologically Motivated Architecture , 2008, ICVS.