DeepEfference: Learning to predict the sensory consequences of action through deep correspondence

As the human eyeball saccades across the visual scene, humans maintain egocentric visual positional constancy despite retinal motion identical to an egocentric shift of the scene. Characterizing the underlying biological computations enabling visual constancy can inform methods of robotic localization by serving as a model for intelligently integrating complimentary, heterogeneous information. Here we present DeepEfference, a bio-inspired, unsupervised, deep sensorimotor network that learns to predict the sensory consequences of self-generated actions. DeepEfference computes dense image correspondences [1] at over 500 Hz and uses only a single monocular grayscale image and a low-dimensional extra-modal motion estimate as data inputs. Designed for robotic applications, DeepEfference employs multi-level fusion via two parallel pathways to learn dense, pixel-level predictions and correspondences between source and target images. We present quantitative and qualitative results from the SceneNet RGBD [2] and KITTI Odometry [3] datasets and demonstrate an approximate runtime decrease of over 20,000% with only a 12% increase in mean pixel matching error compared to DeepMatching [4] on KITTI Odometry.

[1]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[2]  P. Anandan,et al.  A Unified Approach to Moving Object Detection in 2D and 3D Scenes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[4]  D. E.Vonholstan,et al.  The Principle of Reafference : Interactions Between the Central Nervous System and the Peripheral Organs , 2011 .

[5]  R. Sperry Neural basis of the spontaneous optokinetic response produced by visual inversion. , 1950, Journal of comparative and physiological psychology.

[6]  Donald Perlis,et al.  Who's Talking? - Efference Copy and a Robot's Sense of Agency , 2015, AAAI Fall Symposia.

[7]  Geoffrey E. Hinton,et al.  Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[8]  D. Faber,et al.  The Mauthner Cell Half a Century Later: A Neurobiological Model for Decision-Making? , 2005, Neuron.

[9]  Stefan Leutenegger,et al.  SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth , 2016, ArXiv.

[10]  Cordelia Schmid,et al.  DeepMatching: Hierarchical Deformable Dense Matching , 2015, International Journal of Computer Vision.

[11]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[12]  Christopher K. I. Williams,et al.  Transformation Equivariant Boltzmann Machines , 2011, ICANN.

[13]  Giorgio Metta,et al.  A heteroscedastic approach to independent motion detection for actuated visual sensors , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[15]  Donald Perlis,et al.  Reasoning with Grounded Self-Symbols for Human-Robot Interaction , 2016, AAAI Fall Symposia.

[16]  Randal C. Nelson Qualitative detection of motion by a moving observer , 2004, International Journal of Computer Vision.

[17]  Bruce Bridgeman,et al.  A theory of visual stability across saccadic eye movements , 1994, Behavioral and Brain Sciences.

[18]  Geoffrey E. Hinton,et al.  Transforming Autoencoders , 2011 .

[19]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[20]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Tim Oates,et al.  The robot baby and massive metacognition: Future vision , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[22]  Lorenzo Natale,et al.  Object segmentation using independent motion detection , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[23]  Tom Drummond,et al.  Faster and Better: A Machine Learning Approach to Corner Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.