THE COMPUTATIONAL MAGIC OF THE VENTRAL STREAM : TOWARDS A THEORY Tomaso Poggio ? , † with appendices with and by Joel Leibo ?

I conjecture that the sample complexity of object recognition is mostly due to geometric image transformations and that a main goal of the ventral stream – V1, V2, V4 and IT – is to learn-and-discount image transformations. The most surprising implication of the theory emerging from these assumptions is that the computational goals and detailed properties of cells in the ventral stream follow from symmetry properties of the visual world through a process of unsupervised correlational learning. From the assumption of a hierarchy of areas with receptive fields of increasing size the theory predicts that the size of the receptive fields determines which transformations are learned during development and then factored out during normal processing; that the transformation represented in each area determines the tuning of the neurons in the aerea, independently of the statistics of natural images; and that class-specific transformations are learned and represented at the top of the ventral stream hierarchy. Some of the main predictions of this theory-in-fieri are: • the type of transformation that are learned from visual experience depend on the size (measured in terms of wavelength) and thus on the area (layer in the models) – assuming that the aperture size increases with layers; • the mix of transformations learned determine the properties of the receptive fields – oriented bars in V1+V2, radial and spiral patterns in V4 up to class specific tuning in AIT (eg face tuned cells); • class-specific modules – such as faces, places and possibly body areas – should exist in IT to process images of object classes. 1 N at ur e P re ce di ng s : h dl :1 01 01 /n pr e. 20 11 .6 11 7. 2 : P os te d 16 S ep 2 01 1 2 THE COMPUTATIONAL MAGIC OF THE VENTRAL STREAM: TOWARDS A THEORY

[1]  Tomaso Poggio,et al.  Learning to discount transformations as the computational goal of visual cortex , 2011 .

[2]  Joel Z. Leibo,et al.  How can cells in the anterior medial face patch be viewpoint invariant , 2011 .

[3]  L. Rosasco THE COMPUTATIONAL MAGIC OF THE VENTRAL STREAM , 2011 .

[4]  Joel Z. Leibo,et al.  Learning Generic Invariances in Object Recognition: Translation and Scale , 2010 .

[5]  Tomaso Poggio,et al.  From primal templates to invariant recognition , 2010 .

[6]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[7]  Lorenzo Rosasco,et al.  Publisher Accessed Terms of Use Detailed Terms Mathematics of the Neural Response , 2022 .

[8]  Joel Z. Leibo,et al.  Invariant Recognition of Objects by Vision , 2010 .

[9]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[10]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Thomas Serre,et al.  A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex , 2005 .

[12]  Charles F Stevens Preserving properties of object shape by computations in primary visual cortex. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Tomaso Poggio,et al.  Models of object recognition , 2000, Nature Neuroscience.

[14]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[15]  Rajesh P. N. Rao,et al.  Learning Lie Groups for Invariant Visual Perception , 1998, NIPS.

[16]  Ronen Basri,et al.  Recognition by Linear Combinations of Models , 1991, IEEE Trans. Pattern Anal. Mach. Intell..