论文信息 - Learning to discount transformations as the computational goal of visual cortex

Learning to discount transformations as the computational goal of visual cortex

It has been long recognized that a key obstacle to achiev-inghuman-levelobjectrecognitionperformanceis the prob-lem of invariance [10]. The human visual system excelsat factoring out the image transformations that distort ob-ject appearance under natural conditions. Models with acortex-inspired architecture such as HMAX [9,13] as wellas nonbiological convolutional neural networks [5] are in-variant to translation (and in some cases scaling) by virtueof their wiring. The transformations to which this approachhas been applied so far are generic transformations; a sin-gle example image of any object contains all the informa-tion needed to synthesize a new image of the tranformedobject [15]. In a setting in which transformation invariancemust be learned from visual experience (such as for a new-bornhumanbaby),we haveshown that it is possible to learnfrom little visual experince how to be invariant to the trans-lation of any object [7]. The same argument applies to allgeneric transformations.Generic transformations can be “factored out” in recog-nition tasks (see ﬁgure 1) and this is key to goodrecognitionperformance. This is the reason underlying recent observa-tions that random features often perform well on computervision tasks [4,6,11,12].For simplicity consider a speciﬁc example: HMAX. Inan architecturesuch as HMAX, if an input image is encodedin terms of similarity to a set of templates (typically via adot product operation) and if the encoding is made invari-ant with respect to a transformation via appropriate poolingin C cells then recognition performance inherits the invari-ance built into the encoding. The actual templates them-selves do not enter the argument: the set of similarities ofthe input image to the templates need not be high in orderto be invariant. From this point of view, the good perfor-mance achieved with random features on some vision taskscan largely be attributed to the invariance properties of thearchitecture.

Tomaso Poggio | Joel Z. Leibo | Jim Mutch

[1] Shimon Ullman,et al. Class-Based Feature Matching Across Unrestricted Transformations , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Tomaso Poggio,et al. From primal templates to invariant recognition , 2010 .

[3] T. Poggio,et al. Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[4] Joel Z. Leibo,et al. Learning Generic Invariances in Object Recognition: Translation and Scale , 2010 .

[5] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6] Terrence J. Sejnowski,et al. Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[7] Peter Földiák,et al. Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[8] Zhenghao Chen,et al. On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[9] Thomas Serre,et al. Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Thomas Serre,et al. Learning complex cell invariance from natural videos: A plausibility proof , 2007 .

[11] Tomaso Poggio,et al. Image Representations for Visual Learning , 1996, Science.

[12] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .

[13] PoggioTomaso,et al. Robust Object Recognition with Cortex-Like Mechanisms , 2007 .

[14] T. Poggio,et al. Neural mechanisms of object recognition , 2002, Current Opinion in Neurobiology.

[15] Thomas Serre,et al. A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex , 2005 .

[16] Doris Y. Tsao,et al. A Cortical Region Consisting Entirely of Face-Selective Cells , 2006, Science.

[17] Tomaso A. Poggio,et al. Linear Object Classes and Image Synthesis From a Single Example Image , 1997, IEEE Trans. Pattern Anal. Mach. Intell..