WWN-8: Incremental Online Stereo with Shape-from-X Using Life-Long Big Data from Multiple Modalities

Abstract When a child lives in the real world, from infancy to adulthood, his retinae receive a flood of stereo sensory stream. His muscles produce another action stream. How does the child's brain deal with such big data from multiple sensory modalities (left- and right-eye modalities) and multiple effector modalities (location, disparity map, and shape type)? This capability incrementally learns to produce simple-to-complex sensorimotor behaviors — autonomous development. We present a model that incrementally fuses such an open-ended life-long stream and updates the “brain” online so the perceived world is 3D. Traditional methods for shape- from-X use a particular type of cue X (e.g., stereo disparity, shading, etc.) to compute depths or local shapes based on a handcrafted physical model. Such a model likely results in a brit- tle system because of the fluctuation of the availability of the cue. An embodiment of the Developmental Network (DN), called Stereo Where-What Network (WWN-8), learns to per- form simultaneous attention and recognition, while developing invariances in location, disparity, shape, and surface type, so that multiple cues can automatically fill in if a particular type of cue (e.g., texture) is missing locally from the real world. We report some experiments: 1) dynamic synapse retraction and growth as a method of developing receptive fields. 2) training for recognizing 3D objects directly in cluttered natural backgrounds. 3) integration of depth perception with location and type information. The experiments used stereo images and motor actions on the order of 105 frames. Potential applications include driver assistance for road safety, mobile robots, autonomous navigation, and autonomous vision-guided manipulators.

[1]  Juyang Weng,et al.  Symbolic Models and Emergent Models: A Review , 2012, IEEE Transactions on Autonomous Mental Development.

[2]  J. Triesch,et al.  Emergence of Disparity Tuning during the Development of Vergence Eye Movements , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[3]  Kenichi Kanatani,et al.  Shape from Motion , 1990 .

[4]  J. Aloimonos Shape from texture , 1988, Biological cybernetics.

[5]  Juyang Weng,et al.  Where-what network 1: “Where” and “what” assist each other through top-down connections , 2008, 2008 7th IEEE International Conference on Development and Learning.

[6]  Juyang Weng,et al.  Developmental Stereo: Emergence of Disparity Preference in Models of the Visual Cortex , 2009, IEEE Transactions on Autonomous Mental Development.

[7]  Juyang Weng,et al.  Stereo where-what networks: Unsupervised binocular feature learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[8]  Stephan Reichelt,et al.  Depth cues in human visual perception and their realization in 3D displays , 2010, Defense + Commercial Sensing.

[9]  P O Hoyer,et al.  Independent component analysis applied to feature extraction from colour and stereo images , 2000, Network.

[10]  K. Kanatani Group-Theoretical Methods in Image Understanding , 1990 .

[11]  Juyang Weng,et al.  Synapse maintenance in the Where-What Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[12]  David J. Fleet,et al.  Disparity tuning as simulated by a neural net , 2000, Biological Cybernetics.

[13]  Berthold K. P. Horn Obtaining shape from shading information , 1989 .

[14]  Juyang Weng,et al.  Image matching using the windowed Fourier phase , 1993, International Journal of Computer Vision.

[15]  Juyang Weng,et al.  Where What Network 3 : Developmental Top-Down Attention with Multiple Meaningful Foregrounds , 2010 .

[16]  Juyang Weng,et al.  Brain as an Emergent Finite Automaton: A Theory and Three Theorems , 2015 .

[17]  David J. Fleet,et al.  Phase-based disparity measurement , 1991, CVGIP Image Underst..

[18]  Juyang Weng,et al.  Assist Each Other Through Top-down Connections , 2008 .

[19]  S. Lehky,et al.  Neural model of stereoacuity and depth interpolation based on a distributed representation of stereo disparity [published erratum appears in J Neurosci 1991 Mar;11(3):following Table of Contents] , 1990, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[20]  Takeo Kanade,et al.  A Cooperative Algorithm for Stereo Matching and Occlusion Detection , 2000, IEEE Trans. Pattern Anal. Mach. Intell..