WWN-2: A biologically inspired neural network for concurrent visual attention and recognition

Attention and recognition have been addressed separately as two challenging computational vision problems, but an engineering-grade solution to their integration and interaction is still open. Inspired by the brain's dorsal and ventral pathways in cortical visual processing, we present a neuromorphic architecture, called Where-What Network 2 (WWN-2), to integrate object attention and recognition interactively through their experience-based development. This architecture enables three types of attention: feature-based bottom-up attention, position-based top-down attention, and object-based top-down attention, as three possible information flows through the Y-shaped network. The learning mechanism of the network is rooted in a simple but efficient cell-centered synaptic update model, entailing the dual optimization of Hebbian directions and cell firing-age dependent step sizes. The inputs to the network are a sequence of images, where specific foreground objects may appear anywhere within an unknown, complex, natural background. The WWN-2 regulates the network to dynamically establish and consolidate position-specified and type-specified representations through a supervised learning mode. The network has reached 92.5% object recognition rate and an average of 1.5 pixels in position error after 20 epochs of training.

[1]  E. Callaway Local circuits in primary visual cortex of the macaque monkey. , 1998, Annual review of neuroscience.

[2]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[3]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[4]  G. Deco,et al.  A hierarchical neural system with attentional top–down enhancement of the spatial resolution for object recognition , 2000, Vision Research.

[5]  Juyang Weng,et al.  Where-what network 1: “Where” and “what” assist each other through top-down connections , 2008, 2008 7th IEEE International Conference on Development and Learning.

[6]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[7]  Juyang Weng,et al.  Dually Optimal Neuronal Layers: Lobe Component Analysis , 2009, IEEE Transactions on Autonomous Mental Development.

[8]  E. Rolls,et al.  A Neurodynamical cortical model of visual attention and invariant object recognition , 2004, Vision Research.

[9]  S. Grossberg,et al.  Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors , 1976, Biological Cybernetics.

[10]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[11]  Gerhard Krieger,et al.  Scene analysis with saccadic eye movements: Top-down and bottom-up modeling , 2001, J. Electronic Imaging.

[12]  Robert B. Fisher,et al.  Object-based visual attention for computer vision , 2003, Artif. Intell..

[13]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[14]  D. V. van Essen,et al.  A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[15]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[16]  Stephen Grossberg,et al.  ARTMAP: supervised real-time learning and classification of nonstationary data by a self-organizing neural network , 1991, [1991 Proceedings] IEEE Conference on Neural Networks for Ocean Engineering.

[17]  Bärbel Mertsching,et al.  Data- and Model-Driven Gaze Control for an Active-Vision System , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Juyang Weng,et al.  Topographic Class Grouping with applications to 3D object recognition , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).