Learning viewpoint invariant object representations using a temporal coherence principle

Invariant object recognition is arguably one of the major challenges for contemporary machine vision systems. In contrast, the mammalian visual system performs this task virtually effortlessly. How can we exploit our knowledge on the biological system to improve artificial systems? Our understanding of the mammalian early visual system has been augmented by the discovery that general coding principles could explain many aspects of neuronal response properties. How can such schemes be transferred to system level performance? In the present study we train cells on a particular variant of the general principle of temporal coherence, the “stability” objective. These cells are trained on unlabeled real-world images without a teaching signal. We show that after training, the cells form a representation that is largely independent of the viewpoint from which the stimulus is looked at. This finding includes generalization to previously unseen viewpoints. The achieved representation is better suited for view-point invariant object classification than the cells’ input patterns. This property to facilitate view-point invariant classification is maintained even if training and classification take place in the presence of an – also unlabeled – distractor object. In summary, here we show that unsupervised learning using a general coding principle facilitates the classification of real-world objects, that are not segmented from the background and undergo complex, non-isomorphic, transformations.

[1]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[2]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[3]  M. Tarr,et al.  Mental rotation and orientation-dependence in shape recognition , 1989, Cognitive Psychology.

[4]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[5]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[6]  Ronen Basri,et al.  Recognition by Linear Combinations of Models , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[8]  James V. Stone Learning Perceptually Salient Visual Parameters Using Spatiotemporal Smoothness Constraints , 1996, Neural Computation.

[9]  Hiroshi Murase,et al.  Real-time 100 object recognition system , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[10]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[11]  E. Rolls,et al.  INVARIANT FACE AND OBJECT RECOGNITION IN THE VISUAL SYSTEM , 1997, Progress in Neurobiology.

[12]  Heinrich H Bülthoff,et al.  Image-based object recognition in man, monkey and machine , 1998, Cognition.

[13]  Edmund T. Rolls,et al.  A Model of Invariant Object Recognition in the Visual System: Learning Rules, Activation Functions, Lateral Inhibition, and Information-Based Performance Measures , 2000, Neural Computation.

[14]  I. Biederman Recognizing depth-rotated objects: a review of recent research and theory. , 2000, Spatial vision.

[15]  Konrad P. Körding,et al.  Extracting Slow Subspaces from Natural Videos Leads to Complex Cells , 2001, ICANN.

[16]  Christoph Kayser,et al.  Learning the invariance properties of complex cells from their responses to natural stimuli , 2002, The European journal of neuroscience.

[17]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[18]  Edmund T. Rolls,et al.  Invariant Object Recognition in the Visual System with Novel Views of 3D Objects , 2002, Neural Computation.

[19]  J. Touryan,et al.  Isolation of Relevant Visual Features from Random Stimuli for Cortical Complex Cells , 2002, The Journal of Neuroscience.

[20]  Aapo Hyvärinen,et al.  Simple-Cell-Like Receptive Fields Maximize Temporal Coherence in Natural Video , 2003, Neural Computation.

[21]  Konrad P. Körding,et al.  Learning the Nonlinearity of Neurons from Natural Visual Stimuli , 2003, Neural Computation.

[22]  Laurenz Wiskott,et al.  Slow Feature Analysis: A Theoretical Analysis of Optimal Free Responses , 2003, Neural Computation.

[23]  Konrad P. Körding,et al.  The world from a cat’s perspective – statistics of natural videos , 2003, Biological Cybernetics.

[24]  Bruno A. Olshausen,et al.  Principles of Image Representation in Visual Cortex , 2003 .

[25]  Heiko Wersing,et al.  Learning Optimized Features for Hierarchical Models of Invariant Object Recognition , 2003, Neural Computation.

[26]  Christoph Kayser,et al.  Temporal Correlations of Orientations in Natural Scenes , 2002, Neurocomputing.

[27]  P. König,et al.  Learning Distinct and Complementary Feature Selectivities from Natural Colour Videos , 2003, Reviews in the neurosciences.

[28]  Konrad Paul Kording,et al.  How are complex cell properties adapted to the statistics of natural stimuli? , 2004, Journal of neurophysiology.

[29]  L. Chalupa,et al.  The visual neurosciences , 2004 .

[30]  Laurenz Wiskott,et al.  Slow feature analysis yields a rich repertoire of complex cell properties. , 2005, Journal of vision.