Types, Locations, and Scales from Cluttered Natural Video and Actions

We model the autonomous development of brain-inspired circuits through two modalities-video stream and action stream that are synchronized in time. We assume that such multimodal streams are available to a baby through inborn reflexes, self-supervision, and caretaker's supervision, when the baby interacts with the real world. By autonomous development, we mean that not only that the internal (inside the “skull”) self-organization is fully autonomous, but the developmental program (DP) that regulates the computation of the network is also task nonspecific. In this work, the task-nonspecificity is reflected by the fact that the actions associated with an attended object in a cluttered, natural, and dynamic scene is taught after the DP is finished and the “life” has begun. The actions correspond to neuronal firing patterns representing object type, object location and object scale, but learning is directly from unsegmented cluttered scenes. Along the line of where-what networks (WWN), this is the first one that explicitly models multiple “brain” areas-each for a different range of object scales. Among experiments, large natural video experiments were conducted. To show the power of automatic attention in unknown cluttered backgrounds, the last experimental group demonstrated disjoint tests in the presence of large within-class variations (object 3-D-rotations in very different unknown backgrounds), but small between-class variations (small object patches in large similar and different unknown backgrounds), in contrast with global classification tests such as ImageNet and Atari Games.

[1]  Juyang Weng,et al.  Laterally connected lobe component analysis: Precision and topography , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[2]  D. Mumford,et al.  Neural activity in early visual cortex reflects behavioral experience and higher-order perceptual saliency , 2002, Nature Neuroscience.

[3]  Narendra Ahuja,et al.  Learning Recognition and Segmentation Using the Cresceptron , 1997, International Journal of Computer Vision.

[4]  Leslie G. Ungerleider Two cortical visual systems , 1982 .

[5]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[6]  D. V. van Essen,et al.  A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[7]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[8]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[9]  Juyang Weng,et al.  A Theory of Developmental Mental Architecture and the Dav Architecture Design , 2005, Int. J. Humanoid Robotics.

[10]  G. Deco,et al.  A hierarchical neural system with attentional top–down enhancement of the spatial resolution for object recognition , 2000, Vision Research.

[11]  Juyang Weng,et al.  Why Have We Passed “ Neural Networks Do Not Abstract Well ” ? , 2011 .

[12]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[13]  E. Rolls,et al.  A Neurodynamical cortical model of visual attention and invariant object recognition , 2004, Vision Research.

[14]  Juyang Weng,et al.  Where-What Network 5: Dealing with scales for objects in complex backgrounds , 2011, The 2011 International Joint Conference on Neural Networks.

[15]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[16]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[17]  Juyang Weng,et al.  WWN-2: A biologically inspired neural network for concurrent visual attention and recognition , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[18]  Juyang Weng,et al.  A Multilayer in-Place Learning Network for Development of General Invariances , 2007, Int. J. Humanoid Robotics.

[19]  Juyang Weng,et al.  Dually Optimal Neuronal Layers: Lobe Component Analysis , 2009, IEEE Transactions on Autonomous Mental Development.

[20]  I. Rybak,et al.  A model of attention-guided visual perception and recognition , 1998, Vision Research.

[21]  James L. McClelland,et al.  Autonomous Mental Development by Robots and Animals , 2001, Science.

[22]  D C Van Essen,et al.  Shifter circuits: a computational strategy for dynamic aspects of visual processing. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[24]  Juyang Weng,et al.  Where-what network-4: The effect of multiple internal areas , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[25]  Juyang Weng,et al.  Symbolic Models and Emergent Models: A Review , 2012, IEEE Transactions on Autonomous Mental Development.

[26]  Narendra Ahuja,et al.  Cresceptron: a self-organizing neural network which grows adaptively , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[27]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[28]  Juyang Weng,et al.  Where-what network 1: “Where” and “what” assist each other through top-down connections , 2008, 2008 7th IEEE International Conference on Development and Learning.

[29]  Juyang Weng,et al.  WWN: Integration with coarse-to-fine, supervised and reinforcement learning , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[30]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[31]  Juyang Weng,et al.  A developmental where-what network for concurrent and interactive visual attention and recognition , 2015, Robotics Auton. Syst..

[32]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Juyang Weng,et al.  A 5-chunk developmental brain-mind network model for multiple events in complex backgrounds , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[34]  Takayuki Ito,et al.  Neocognitron: A neural network model for a mechanism of visual pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[35]  Juyang Weng,et al.  Where What Network 3 : Developmental Top-Down Attention with Multiple Meaningful Foregrounds , 2010 .

[36]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.