Identification of Invariant Sensorimotor Structures as a Prerequisite for the Discovery of Objects

Perceiving the surrounding environment in terms of objects is useful for any general purpose intelligent agent. In this paper, we investigate a fundamental mechanism making object perception possible, namely the identification of spatio-temporally invariant structures in the sensorimotor experience of an agent. We take inspiration from the Sensorimotor Contingencies Theory to define a computational model of this mechanism through a sensorimotor, unsupervised and predictive approach. Our model is based on processing the unsupervised interaction of an artificial agent with its environment. We show how spatio-temporally invariant structures in the environment induce regularities in the sensorimotor experience of an agent, and how this agent, while building a predictive model of its sensorimotor experience, can capture them as densely connected subgraphs in a graph of sensory states connected by motor commands. Our approach is focused on elementary mechanisms, and is illustrated with a set of simple experiments in which an agent interacts with an environment. We show how the agent can build an internal model of moving but spatio-temporally invariant structures by performing a Spectral Clustering of the graph modeling its overall sensorimotor experiences. We systematically examine properties of the model, shedding light more globally on the specificities of the paradigm with respect to methods based on the supervised processing of collections of static images.

[1]  D. Modha,et al.  Network architecture of the long-distance pathways in the macaque brain , 2010, Proceedings of the National Academy of Sciences.

[2]  Tom Silver,et al.  Behavior Is Everything: Towards Representing Concepts with Sensorimotor Contingencies , 2018, AAAI.

[3]  Paul R. Cohen,et al.  Neo: learning conceptual knowledge by sensorimotor interaction with an environment , 1997, AGENTS '97.

[4]  Peter Glöckner,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .

[5]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Alexander Maye,et al.  A discrete computational model of sensorimotor contingencies for object perception and control of behavior , 2011, 2011 IEEE International Conference on Robotics and Automation.

[7]  Kristen Grauman,et al.  Learning image representations equivariant to ego-motion , 2015, ArXiv.

[8]  Martial Hebert,et al.  An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders , 2016, ECCV.

[9]  A. Noë,et al.  A sensorimotor account of vision and visual consciousness. , 2001, The Behavioral and brain sciences.

[10]  J. Fuster The cognit: a network model of cortical representation. , 2006, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[11]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Donald D. Hoffman The Interface Theory of Perception , 2016 .

[13]  H. Helmholtz Handbuch der physiologischen Optik , 2015 .

[14]  Nikolas Hemion,et al.  Grounding object perception in a naive agent's sensorimotor experience , 2015, 2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[15]  A. Clark Whatever next? Predictive brains, situated agents, and the future of cognitive science. , 2013, The Behavioral and brain sciences.

[16]  A. Seth A predictive processing theory of sensorimotor contingencies: Explaining the puzzle of perceptual presence and its absence in synesthesia , 2014, Cognitive neuroscience.

[17]  Nikolas J. Hemion,et al.  Context discovery for model learning in partially observable environments , 2016, 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[18]  Abhinav Gupta,et al.  Unsupervised Learning of Visual Representations Using Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Justus H. Piater,et al.  Computational models of affordance in robotics: a taxonomy and systematic classification , 2017, Adapt. Behav..

[20]  Abhinav Gupta,et al.  The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[21]  A. Damasio Time-locked multiregional retroactivation: A systems-level proposal for the neural substrates of recall and recognition , 1989, Cognition.

[22]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[23]  E. Reed The Ecological Approach to Visual Perception , 1989 .

[24]  Manuel Lopes,et al.  Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.

[25]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[26]  Karl J. Friston,et al.  A free energy principle for the brain , 2006, Journal of Physiology-Paris.

[27]  Oliver Brock,et al.  Learning state representations with robotic priors , 2015, Auton. Robots.

[28]  Rolf Pfeifer,et al.  Classification as Sensory-Motor Coordination: A Case Study on Autonomous Agents , 1995, ECAL.

[29]  Jitendra Malik,et al.  Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Antonio Torralba,et al.  Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ralf Der,et al.  Homeokinesis - A new principle to back up evolution with learning , 1999 .

[32]  Marco Colombetti,et al.  Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[33]  Adeel Razi,et al.  Biological Self-organisation and Markov blankets , 2017, bioRxiv.

[34]  Rajesh P. N. Rao,et al.  CHAPTER 91 – Probabilistic Models of Attention Based on Iconic Representations and Predictive Coding , 2005 .

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Oliver Brock,et al.  Interactive Perception: Leveraging Action in Perception and Perception in Action , 2016, IEEE Transactions on Robotics.

[37]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[38]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[40]  Antonio Torralba,et al.  Anticipating the future by watching unlabeled video , 2015, ArXiv.

[41]  Alban Laflaquière,et al.  Autonomous grounding of visual field experience through sensorimotor prediction , 2016, 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[42]  Max Tegmark,et al.  Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.

[43]  Marina Meila,et al.  L 10 : Spectral Clustering , 2016 .

[44]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[45]  J. Kevin O'Regan,et al.  What it is like to see: A sensorimotor theory of perceptual experience , 2001, Synthese.

[46]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[47]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.