Learning to discount transformations as the computational goal of visual cortex

It has been long recognized that a key obstacle to achiev-inghuman-levelobjectrecognitionperformanceis the prob-lem of invariance [10]. The human visual system excelsat factoring out the image transformations that distort ob-ject appearance under natural conditions. Models with acortex-inspired architecture such as HMAX [9,13] as wellas nonbiological convolutional neural networks [5] are in-variant to translation (and in some cases scaling) by virtueof their wiring. The transformations to which this approachhas been applied so far are generic transformations; a sin-gle example image of any object contains all the informa-tion needed to synthesize a new image of the tranformedobject [15]. In a setting in which transformation invariancemust be learned from visual experience (such as for a new-bornhumanbaby),we haveshown that it is possible to learnfrom little visual experince how to be invariant to the trans-lation of any object [7]. The same argument applies to allgeneric transformations.Generic transformations can be “factored out” in recog-nition tasks (see figure 1) and this is key to goodrecognitionperformance. This is the reason underlying recent observa-tions that random features often perform well on computervision tasks [4,6,11,12].For simplicity consider a specific example: HMAX. Inan architecturesuch as HMAX, if an input image is encodedin terms of similarity to a set of templates (typically via adot product operation) and if the encoding is made invari-ant with respect to a transformation via appropriate poolingin C cells then recognition performance inherits the invari-ance built into the encoding. The actual templates them-selves do not enter the argument: the set of similarities ofthe input image to the templates need not be high in orderto be invariant. From this point of view, the good perfor-mance achieved with random features on some vision taskscan largely be attributed to the invariance properties of thearchitecture.

[1]  Shimon Ullman,et al.  Class-Based Feature Matching Across Unrestricted Transformations , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Tomaso Poggio,et al.  From primal templates to invariant recognition , 2010 .

[3]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[4]  Joel Z. Leibo,et al.  Learning Generic Invariances in Object Recognition: Translation and Scale , 2010 .

[5]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[7]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[8]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[9]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Thomas Serre,et al.  Learning complex cell invariance from natural videos: A plausibility proof , 2007 .

[11]  Tomaso Poggio,et al.  Image Representations for Visual Learning , 1996, Science.

[12]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[13]  PoggioTomaso,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007 .

[14]  T. Poggio,et al.  Neural mechanisms of object recognition , 2002, Current Opinion in Neurobiology.

[15]  Thomas Serre,et al.  A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex , 2005 .

[16]  Doris Y. Tsao,et al.  A Cortical Region Consisting Entirely of Face-Selective Cells , 2006, Science.

[17]  Tomaso A. Poggio,et al.  Linear Object Classes and Image Synthesis From a Single Example Image , 1997, IEEE Trans. Pattern Anal. Mach. Intell..