Toward Autonomous Intelligence: From Active 3D Vision to Invariant Object and Scene Learning, Recognition, and Search

How do we learn what a visually seen object is? How do our brains learn without supervision to link multiple views of the same object into an invariant object category while our eyes scan a scene, even before we have a concept of the object? Indeed, why do we not link together views of different objects when there is no teacher to correct us? Why do not our eyes move around randomly? How do they explore salient features of novel objects and thereby enable us to learn view-, size-, and positionally invariant object categories? How do representations of a scene remain binocularly fused as our eyes explore it? How do we solve the Where’s Waldo problem and thereby efficiently search for desired objects in a scene? This article summarizes the ARTSCAN and ARTSCENE families of neural models, culminating in the 3D ARTSCAN Search model that clarifies how the brain solves these problems in a unified way by coordinating processes of 3D vision and figure-ground separation, spatial and object attention, object and scene category learning, predictive remapping, and eye movement search. ARTSCAN illustrates revolutionary new computational paradigms whereby the brain computes: Complementary Computing clarifies the nature of brain specialization, and Laminar Computing clarifies why all neocortical circuits exhibit a layered architecture. ARTSCAN also provides unified explanations and simulations of brain and behavioral data, and computer simulation benchmarks that support the model, which provides a blueprint for developing a new type of system for active vision and autonomous learning, recognition, search, and robotics.

[1]  P. Cavanagh,et al.  Visual stability based on remapping of attention pointers , 2010, Trends in Cognitive Sciences.

[2]  Nicholas C. Foley,et al.  Neural Dynamics of Object-based Multifocal Visual Spatial Attention and Priming: Object Cueing, Useful-field-of-view, and Crowding Cognitive Psychology , 2012 .

[3]  S Grossberg,et al.  3-D vision and figure-ground separation by visual cortex , 2010, Perception & psychophysics.

[4]  M. Chun,et al.  Contextual cueing of visual attention , 2022 .

[5]  Stephen Grossberg,et al.  ARTSCENE: A neural system for natural scene classification. , 2009, Journal of vision.

[6]  S. Grossberg,et al.  Cortical dynamics of contextually cued attentive visual learning and search: spatial and object evidence accumulation. , 2010, Psychological review.

[7]  Tomaso Poggio,et al.  Trade-Off between Object Selectivity and Tolerance in Monkey Inferotemporal Cortex , 2007, The Journal of Neuroscience.

[8]  S. Grossberg,et al.  Where’s Waldo? How perceptual, cognitive, and emotional brain processes cooperate during learning to categorize and find desired objects in a cluttered scene , 2014, Front. Integr. Neurosci..

[9]  James J. DiCarlo,et al.  Unsupervised Natural Experience Rapidly Alters Invariant Object Representation in Visual Cortex , 2008, Science.

[10]  A. Mizuno,et al.  A change of the leading player in flow Visualization technique , 2006, J. Vis..

[11]  James M. Brown,et al.  Shifting attention into and out of objects: Evaluating the processes underlying the object advantage , 2005, Perception & psychophysics.

[12]  C W Tyler,et al.  Mechanisms of Stereoscopic Processing: Stereoattention and Surface Perception in Depth Reconstruction , 1995, Perception.

[13]  S. Grossberg,et al.  Pattern Recognition by Self-Organizing Neural Networks , 1991 .

[14]  P. Tse,et al.  Rotating dotted ellipses: Motion perception driven by grouped figural rather than local dot motion signals , 2007, Vision Research.

[15]  J. Theeuwes,et al.  Object-based eye movements: The eyes prefer to stay within the same object , 2010, Attention, perception & psychophysics.

[16]  S. Grossberg Cortical and subcortical predictive dynamics and learning during perception, cognition, emotion and action , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[17]  F. Scharnowski,et al.  Long-lasting modulation of feature integration by transcranial magnetic stimulation. , 2009, Journal of vision.

[18]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[19]  S. Grossberg,et al.  Binocular fusion and invariant category learning due to predictive remapping during scanning of a depthful scene with eye movements , 2015, Front. Psychol..

[20]  Stephen Grossberg,et al.  Adaptive Resonance Theory: How a brain learns to consciously attend, learn, and recognize a changing world , 2013, Neural Networks.

[21]  Stephen Grossberg,et al.  On the road to invariant recognition: Explaining tradeoff and morph properties of cells in inferotemporal cortex using multiple-scale task-sensitive attentive learning , 2011, Neural Networks.

[22]  S. Grossberg,et al.  Normal and amnesic learning, recognition and memory by a neural model of cortico-hippocampal interactions , 1993, Trends in Neurosciences.

[23]  S. Grossberg,et al.  View-invariant object category learning, recognition, and search: How spatial and object attention are coordinated using surface-based attentional shrouds , 2009, Cognitive Psychology.

[24]  Stephen Grossberg,et al.  How does the brain rapidly learn and reorganize view-invariant and position-invariant object representations in the inferotemporal cortex? , 2011, Neural Networks.

[25]  S. Grossberg How does a brain build a cognitive code , 1980 .

[26]  David E. Irwin Information integration across saccadic eye movements , 1991, Cognitive Psychology.

[27]  S. Yantis,et al.  A Domain-Independent Source of Cognitive Control for Task Sets: Shifting Spatial Attention and Switching Categorization Rules , 2009, The Journal of Neuroscience.