The invariance hypothesis and the ventral stream

Unlike Athena, the new ventral stream theory foreshadowed in these pages did not spring fully-formed from the head of Zeus. We: primarily Tomaso Poggio, Fabio Anselmi, Lorenzo Rosasco, Jim Mutch, Andrea Tacchetti, and myself, developed it—and continue to refine it—within a context in which the questions considered in this dissertation loom large. Each of its four main chapters can be read independently, but a common thread runs through them. In other manuscripts—some already released, and some currently in preparation—we spin the same thread into the cloth of the new theory. Most of the chapters of this dissertation propose (or analyze) specific models designed to elucidate particular aspects of the ventral stream and the object recognition algorithm it implements. Some of them (chapters two and three) were mostly completed before we knew they were parts of a larger story. Whereas, the work of chapters four and five was undertaken more consciously as part of the larger theory’s development. A version of each chapter has appeared, or soon will appear, as a standalone article. Chapter one, the introduction, has two parts. It begins with a somewhat idiosyncratic review of perception (mostly focused on vision) which highlights the importance of cortical convolution and pooling operations—two ideas which turn out to be central to the subsequent chapters and to the new theory. The background section is written on a very elementary level; I originally wrote it (with Tomaso Poggio) as a chapter in the Handbook of Research in Biomimetic and Biohybrid Systems (eds. Prescott & Lepora) soon to be published by Oxford University Press. It was aimed at the level of an advanced undergraduate or a researcher in a different field. The second half of the first chapter gives a more up-to-date introduction to the new theory. It focuses on ideas that, while being important for the main theory, are much more attributed to the work of others besides myself. The text is a combination of sections that I originally wrote for two other manuscripts currently in review. Chapter two, Measuring invariance, proposes an operational definition of invariance which can be used to compare neural data, behavior, and computational models. It then describes the application of the definition to an analysis of the invariance properties of HMAX, an older model of the ventral stream [1, 2], which motivated much of our later work. In particular, we showed that, from a certain point of view, there really is no selectivity-invariance trade-off. HMAX can be almost perfectly invariant to translation and scaling. Furthermore, we showed that high similarity of the template images to the to-be-recognized images is not a requirement for invariance and that, in many cases, surprisingly small numbers of templates suffice for robust recognition despite translation and scaling transformations. The finding that random dot pattern templates are perfectly reasonable templates to use for invariant recognition was surprising at the time, but thanks to subsequent development of the theory, is now well-understood. Other groups made similar observations around the same time [3, 4]. A version of chapter two appeared as a CSAIL technical report in 2010: [5], and we also published some of the same ideas in [6]. Chapter three, Learning and disrupting invariance in visual recognition with a temporal association rule, was joint work. Leyla Isik and I contributed equally. We modeled Li and DiCarlo’s “invariance disruption” experiments [7, 8] using a modified HMAX model. In those experiments, monkeys passively viewed objects which changed identity while saccades brought them from a peripheral retinal position to the fovea. As little as one hour of exposure to this strange visual environment caused significant effects on the position invariance of single units. But don’t “errors of temporal association” like this happen all the time over the course of normal vision? Lights turn on and off, objects are occluded, you blink your eyes—all of these should cause errors in temporal association. If temporal association is really the method by which invariance to larger patterns is developed and maintained, then why doesn’t the fact that its assumptions are so often violated lead to huge errors in invariance? In this work, we showed that these models are actually quite robust to this kind of error. As long as the errors are random (uncorrelated), then you can accumulate surprisingly large numbers of them before there is an impact on the performance of the whole system. This result turns out to be an important one for the later plausibility of our ventral stream theory’s reliance on temporal association learning. Chapter four builds on the results from a paper I wrote for NIPS in 2011 [9] entitled “Why The Brain Separates Face Recognition From Object Recognition”. In that paper, we conjectured that the need to discount class-specific transformations (e.g., 3D rotation in depth) is the reason that there are domain-specific subregions (e.g., face patches [10–12]) of the ventral stream. We also give an explanation, based on the new theory, for why the cells in the anterior (high-level) ventral stream are tuned to more complex features than those in the more posterior lower-levels. Chapter five is concerned with biologically plausible learning rules through which the brain cuuld develop the necessary circuitry to implement these models. As a consequence, we predict that neuronal tuning properties at all levels of the ventral stream are described by the solutions to a particular eigenvalue equation we named the cortical equation. In the case of small receptive fields, as in primary visual cortex, its solutions are Gabor wavelets. In the case of large receptive fields, corresponding to a face-specific region, the predicted tuning curves for 3D rotations of faces resemble the mysterious mirror-symmetric tuning curves of neurons in the anterior lateral face patch [13]. Thesis Supervisor: Tomaso Poggio Title: Eugene McDermott Professor

[1]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[2]  Joel Z. Leibo,et al.  Learning Generic Invariances in Object Recognition: Translation and Scale , 2010 .

[3]  Ville Ojansivu,et al.  Blur Insensitive Texture Classification Using Local Phase Quantization , 2008, ICISP.

[4]  Edmund T. Rolls,et al.  Learning invariant object recognition in the visual system with continuous transformations , 2006, Biological Cybernetics.

[5]  Y. Miyashita Neuronal correlate of visual associative long-term memory in the primate temporal cortex , 1988, Nature.

[6]  P. König,et al.  A Model of the Ventral Visual System Based on Temporal Stability and Local Memory , 2006, PLoS biology.

[7]  Joel Z. Leibo,et al.  The dynamics of invariant object recognition in the human visual system. , 2014, Journal of neurophysiology.

[8]  Nicolas Pinto,et al.  How far can you get with a modern face recognition test set using only simple features? , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Thomas Serre,et al.  A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex , 2005 .

[10]  Tomaso Poggio,et al.  From primal templates to invariant recognition , 2010 .

[11]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[12]  Keiji Tanaka,et al.  Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex. , 1994, Journal of neurophysiology.

[13]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  PoggioTomaso,et al.  Linear Object Classes and Image Synthesis From a Single Example Image , 1997 .

[15]  Shimon Ullman,et al.  Computation of pattern invariance in brain-like structures , 1999, Neural Networks.

[16]  N. Kanwisher,et al.  A Cortical Area Selective for Visual Processing of the Human Body , 2001, Science.

[17]  Tomaso Poggio,et al.  CNS: a GPU-based framework for simulating cortically-organized networks , 2010 .

[18]  David D. Cox,et al.  What response properties do individual neurons need to underlie position and clutter "invariant" object recognition? , 2009, Journal of neurophysiology.

[19]  Tomaso Poggio,et al.  Trade-Off between Object Selectivity and Tolerance in Monkey Inferotemporal Cortex , 2007, The Journal of Neuroscience.

[20]  A. Cowey,et al.  The ganglion cell and cone distributions in the monkey's retina: Implications for central magnification factors , 1985, Vision Research.

[21]  G. Mitchison Neuronal branching patterns and the economy of cortical wiring , 1991, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[22]  T Poggio,et al.  View-based models of 3D object recognition: invariance to imaging transformations. , 1995, Cerebral cortex.

[23]  T. Carlson,et al.  High temporal resolution decoding of object position and category. , 2011, Journal of vision.

[24]  N. Kanwisher Functional specificity in the human brain: A window into the functional architecture of the mind , 2010, Proceedings of the National Academy of Sciences.

[25]  N. Kanwisher,et al.  The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception , 1997, The Journal of Neuroscience.

[26]  Doris Y. Tsao,et al.  Functional Compartmentalization and Viewpoint Generalization Within the Macaque Face-Processing System , 2010, Science.

[27]  Tomaso A. Poggio,et al.  Linear Object Classes and Image Synthesis From a Single Example Image , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[29]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  H. Barlow Vision: A computational investigation into the human representation and processing of visual information: David Marr. San Francisco: W. H. Freeman, 1982. pp. xvi + 397 , 1983 .

[31]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[32]  N. Logothetis,et al.  View-dependent object recognition by monkeys , 1994, Current Biology.

[33]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[35]  Philippe G Schyns,et al.  Diagnostic recognition: task constraints, object information, and their interactions , 1998, Cognition.

[36]  M. Tarr,et al.  Becoming a “Greeble” Expert: Exploring Mechanisms for Face Recognition , 1997, Vision Research.

[37]  D. B. Bender,et al.  Visual properties of neurons in inferotemporal cortex of the Macaque. , 1972, Journal of neurophysiology.

[38]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[39]  R. Douglas,et al.  Neuronal circuits of the neocortex. , 2004, Annual review of neuroscience.

[40]  J. O'Regan,et al.  Some results on translation invariance in the human visual system. , 1990, Spatial vision.

[41]  P. Downing,et al.  Selectivity for the human body in the fusiform gyrus. , 2005, Journal of neurophysiology.

[42]  M. Fahle,et al.  The role of visual field position in pattern–discrimination learning , 1997, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[43]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[44]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[45]  J. DiCarlo,et al.  'Breaking' position-invariant object recognition , 2005, Nature Neuroscience.

[46]  Joel Z. Leibo,et al.  Learning invariant representations and applications to face verification , 2013, NIPS.

[47]  P. Downing,et al.  The role of occipitotemporal body-selective regions in person perception , 2011, Cognitive neuroscience.

[48]  Timothée Masquelier,et al.  Unsupervised Learning of Visual Features through Spike Timing Dependent Plasticity , 2007, PLoS Comput. Biol..

[49]  Tomaso A. Poggio,et al.  On Edge Detection , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  S. Gerber,et al.  Unsupervised Natural Experience Rapidly Alters Invariant Object Representation in Visual Cortex , 2008 .

[51]  Tomaso Poggio,et al.  Computational vision and regularization theory , 1985, Nature.

[52]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[53]  A. Tarski,et al.  What are logical notions , 1986 .

[54]  N. Kanwisher,et al.  The fusiform face area: a cortical region specialized for the perception of faces , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[55]  Lorenzo Rosasco,et al.  Spectral Algorithms for Supervised Learning , 2008, Neural Computation.

[56]  Gerald Penn,et al.  Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[57]  Patrick J. Flynn,et al.  Overview of the face recognition grand challenge , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[58]  Julian Eggert,et al.  Learning viewpoint invariant object representations using a temporal coherence principle , 2005, Biological Cybernetics.

[59]  Doris Y. Tsao,et al.  A Cortical Region Consisting Entirely of Face-Selective Cells , 2006, Science.

[60]  D. B. Bender,et al.  Visual Receptive Fields of Neurons in Inferotemporal Cortex of the Monkey , 1969, Science.

[61]  N. Logothetis,et al.  fMRI of the Face-Processing Network in the Ventral Temporal Lobe of Awake and Anesthetized Macaques , 2011, Neuron.

[62]  David D. Cox,et al.  A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation , 2009, PLoS Comput. Biol..

[63]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  J. Maunsell,et al.  Anterior inferotemporal neurons of monkeys engaged in object recognition can be highly sensitive to object retinal position. , 2003, Journal of neurophysiology.

[65]  T. Poggio THE COMPUTATIONAL MAGIC OF THE VENTRAL STREAM: TOWARDS A THEORY , 2011 .

[66]  Tomaso Poggio,et al.  From Understanding Computation to Understanding Neural Circuitry , 1976 .

[67]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[68]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[69]  Doris Y. Tsao,et al.  Patches with Links: A Unified System for Processing Faces in the Macaque Temporal Lobe , 2008, Science.

[70]  Nancy Kanwisher,et al.  A cortical representation of the local visual environment , 1998, Nature.

[71]  S. Edelman,et al.  Imperfect Invariance to Object Translation in the Discrimination of Complex Shapes , 2001, Perception.

[72]  R. Vogels,et al.  Spatial sensitivity of macaque inferior temporal neurons , 2000, The Journal of comparative neurology.

[73]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[74]  Tomaso Poggio,et al.  A hierarchical model of peripheral vision , 2011 .

[75]  Michael W. Spratling Learning viewpoint invariant perceptual representations from cluttered images , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76]  Frédéric Jurie,et al.  Face Recognition using Local Quantized Patterns , 2012, BMVC.

[77]  Tomaso A. Poggio,et al.  A Canonical Neural Circuit for Cortical Nonlinear Operations , 2008, Neural Computation.

[78]  Joel Z. Leibo,et al.  Subtasks of Unconstrained Face Recognition , 2014, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[79]  Eero P. Simoncelli,et al.  Spatiotemporal Elements of Macaque V1 Receptive Fields , 2005, Neuron.

[80]  Tal Hassner,et al.  Effective Unconstrained Face Recognition by Combining Multiple Descriptors and Learned Background Statistics , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  Ha Hong,et al.  The Neural Representation Benchmark and its Evaluation on Brain and Machine , 2013, ICLR.

[82]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[83]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[84]  Joel Z. Leibo,et al.  Learning and disrupting invariance in visual recognition with a temporal association rule , 2011, Front. Comput. Neurosci..

[85]  Matthew A. Kupinski,et al.  Objective Assessment of Image Quality , 2005 .

[86]  D. Schacter,et al.  On the nature of medial temporal lobe contributions to the constructive simulation of future events , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[87]  Koen E. A. van de Sande,et al.  Empowering Visual Categorization With the GPU , 2011, IEEE Transactions on Multimedia.

[88]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[89]  H H Bülthoff,et al.  Psychophysical support for a two-dimensional view interpolation theory of object recognition. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[90]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[91]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[92]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[93]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[94]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[95]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[96]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[97]  M. Sur,et al.  Experimentally induced visual projections into auditory thalamus and cortex. , 1988, Science.

[98]  Wendy L. Braje,et al.  Illumination effects in face recognition , 1998, Psychobiology.

[99]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[100]  S. Shamma On the role of space and time in auditory processing , 2001, Trends in Cognitive Sciences.

[101]  Leslie G. Ungerleider,et al.  Object vision and spatial vision: two cortical pathways , 1983, Trends in Neurosciences.

[102]  Joel Z. Leibo,et al.  Why The Brain Separates Face Recognition From Object Recognition , 2011, NIPS.

[103]  David J. Freedman,et al.  Dynamic population coding of category information in inferior temporal and prefrontal cortex. , 2008, Journal of neurophysiology.

[104]  S. Shamma Speech processing in the auditory system. II: Lateral inhibition and the central processing of speech evoked activity in the auditory nerve. , 1985, The Journal of the Acoustical Society of America.

[105]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[106]  George W. Quinn,et al.  Report on the Evaluation of 2D Still-Image Face Recognition Algorithms , 2011 .

[107]  H. Barrett,et al.  Objective assessment of image quality. III. ROC metrics, ideal observers, and likelihood-generating functions. , 1998, Journal of the Optical Society of America. A, Optics, image science, and vision.

[108]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[109]  P. Schyns,et al.  Information and viewpoint dependence in face recognition , 1997, Cognition.

[110]  I. Biederman,et al.  Evidence for Complete Translational and Reflectional Invariance in Visual Object Priming , 1991, Perception.

[111]  Nicolas Pinto,et al.  Establishing Good Benchmarks and Baselines for Face Recognition , 2008 .

[112]  J. DiCarlo,et al.  Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal Cortex , 2010, Neuron.

[113]  M. Bar,et al.  Scenes Unseen: The Parahippocampal Cortex Intrinsically Subserves Contextual Associations, Not Scenes or Places Per Se , 2008, The Journal of Neuroscience.

[114]  N. Kanwisher,et al.  Visual word processing and experiential origins of functional selectivity in human extrastriate cortex , 2007, Proceedings of the National Academy of Sciences.

[115]  D. Perrett,et al.  Visual neurones responsive to faces in the monkey temporal cortex , 2004, Experimental Brain Research.

[116]  E. Marder Neuromodulation of Neuronal Circuits: Back to the Future , 2012, Neuron.

[117]  Robbe L. T. Goris,et al.  Frontiers in Computational Neuroscience Computational Neuroscience Neural Representations That Support Invariant Object Recognition , 2022 .

[118]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[119]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[120]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[121]  D. V. Essen,et al.  Surface-Based and Probabilistic Atlases of Primate Cerebral Cortex , 2007, Neuron.

[122]  R. Malach,et al.  Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[123]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[124]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[125]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[126]  H. Barlow Why have multiple cortical areas? , 1986, Vision Research.

[127]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[128]  N. Kanwisher,et al.  Face perception: domain specific, not process specific. , 2004, Neuron.

[129]  S Lehéricy,et al.  The visual word form area: spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients. , 2000, Brain : a journal of neurology.

[130]  David H. Foster,et al.  Visual Comparison of Rotated and Reflected Random-Dot Patterns as a Function of Their Positional Symmetry and Separation in the Field* , 1981 .

[131]  D. Chklovskii,et al.  Maps in the brain: what can we learn from them? , 2004, Annual review of neuroscience.

[132]  Kevan A. C. Martin,et al.  A Canonical Microcircuit for Neocortex , 1989, Neural Computation.

[133]  Nicolas Pinto,et al.  Beyond simple features: A large-scale feature search approach to unconstrained face recognition , 2011, Face and Gesture 2011.

[134]  Alan L. Yuille,et al.  A regularized solution to edge detection , 1985, J. Complex..

[135]  Scott D. Slotnick,et al.  The Visual Word Form Area , 2013 .

[136]  Guy Wallis,et al.  Learning Illumination-and Orientation-invariant Representations of Objects through Temporal Association General Methods Experiment Ii , 2022 .

[137]  Heinrich H Bülthoff,et al.  Image-based object recognition in man, monkey and machine , 1998, Cognition.

[138]  Thomas Serre,et al.  Learning complex cell invariance from natural videos: A plausibility proof , 2007 .

[139]  H. Bülthoff,et al.  Effects of temporal association on recognition memory , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[140]  Lorenzo Rosasco,et al.  The computational magic of the ventral stream: sketch of a theory (and why some deep architectures work). , 2012 .

[141]  N. Logothetis,et al.  Shape representation in the inferior temporal cortex of monkeys , 1995, Current Biology.

[142]  M. Fahle,et al.  Limited translation invariance of human visual pattern recognition , 1998, Perception & psychophysics.

[143]  S. Nelson,et al.  Homeostatic plasticity in the developing nervous system , 2004, Nature Reviews Neuroscience.

[144]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[145]  Edmund T. Rolls,et al.  Invariant Object Recognition in the Visual System with Novel Views of 3D Objects , 2002, Neural Computation.

[146]  E. Rolls,et al.  INVARIANT FACE AND OBJECT RECOGNITION IN THE VISUAL SYSTEM , 1997, Progress in Neurobiology.

[147]  E. Rolls,et al.  View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex. , 1998, Cerebral cortex.

[148]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[149]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[150]  Matti Pietikäinen,et al.  Multiscale Local Phase Quantization for Robust Component-Based Face Recognition Using Kernel Fusion of Multiple Descriptors , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[151]  David I. Perrett,et al.  Neurophysiology of shape processing , 1993, Image Vis. Comput..

[152]  Jian Sun,et al.  Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[153]  Joel Z. Leibo,et al.  Neurons That Confuse Mirror-Symmetric Object Views , 2010 .

[154]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[155]  Nicole C. Rust,et al.  Selectivity and Tolerance (“Invariance”) Both Increase as Visual Information Propagates from Cortical Area V4 to IT , 2010, The Journal of Neuroscience.

[156]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[157]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[158]  M. Stryker Temporal associations , 1991, Nature.

[159]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[160]  Nicolas Pinto,et al.  Comparing state-of-the-art visual features on invariant object recognition tasks , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[161]  M. Tarr,et al.  Do viewpoint-dependent mechanisms generalize across members of a class? , 1998, Cognition.

[162]  S. Ullman Aligning pictorial descriptions: An approach to object recognition , 1989, Cognition.

[163]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[164]  Doris Y. Tsao,et al.  Faces and objects in macaque cerebral cortex , 2003, Nature Neuroscience.

[165]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[166]  Laurenz Wiskott,et al.  Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View Cells , 2007, PLoS Comput. Biol..

[167]  Joel Z. Leibo,et al.  Does invariant recognition predict tuning of neurons in sensory cortex ? , 2013 .

[168]  Michael J. Tarr Is human object recognition better described by geon structural description or by multiple views , 1995 .

[169]  H. Bülthoff,et al.  Face recognition under varying poses: The role of texture and shape , 1996, Vision Research.