Visual Object Recognition: Can A Single Mechanism Suffice?

-1-Visual Object Recognition: Can A Single Mechanism Suffice?Michael J. TarrBrown University“In actual, physical life I can turn as simply and swiftly as anyone. Butmentally, with my eyes closed and my body immobile, I am unable toswitch from one direction to the other. Some swivel cell in my brain doesnot work. I can cheat, of course, by setting aside the mental snapshot ofone vista and leisurely selecting the opposite view for my walk back tomy starting point. But if I do not cheat, some kind of atrocious obstacle,which would drive me mad if I persevered, prevents me from imaginingthe twist which transforms one direction into another, directly opposite. Iam crushed, I am carrying the whole world on my back in the process oftrying to visualize my turning around and making myself see in terms of“right” what I saw in terms of “left” and vice versa.”—Vladimir NabokovHow do humans recognize 3D objects? This simple question leads to surprisinglycomplex answers. Indeed, object recognition is sufficiently difficult that state-of-the-artcomputer vision systems can only perform the most rudimentary visual tasks and, eventhen, only under highly constrained conditions. At the heart of what makes visualrecognition difficult are two factors. First, we live in a world made up of 3D objects, yetonly receive 2D stimulation on our retinae as sense input. Second, we live in a highlyvariable world in which images of objects change constantly due to transformations insize, position, orientation, pose, color, lighting, and configuration. The challenge is toderive a consistent mapping from a potentially infinite set of images to a relativelysmall number of known objects and categories. It is a problem that the human visualsystem routinely and effortlessly solves.How the mammalian brain solves the problem of visual recognition has been a topic ofstudy since the early days of cognitive science. David Hubel and Torsten Wiesel(1959) received the Nobel Prize for their discovery of organized columns oforientation-tuned neurons in cat visual cortex. This critical result appeared to capturean important facet of visual processing—a visual system that is sensitive to edges(boundaries between regions of light and dark) positioned at different orientations inspace. Once the particular orientations of edges are known, it seemed only a smallstep to “connect the dots”—joining edges into more complex descriptions of objectshape. Edge-based representations appeared ideal for recognition: shape definingedges often capture the critical features of objects and remain relatively invariant overmany image transformations. Thus, most vision scientists came to believe that the goalof vision was to derive or reconstruct an edge-based description of object shape.It was this belief that drove David Marr to develop two ideas that have dramaticallyinfluenced the study of visual object recognition. The first was a computational theoryin which Marr and Ellen Hildreth (1980) proposed what they saw as the processingconstraints needed to build a successful edge detector. They observed that animplemented version of their detector behaved much like some subclasses of visualneurons (so-called “simple cells”). Once Marr had a plausible algorithm for finding

[1]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[2]  R. Shepard,et al.  Mental Rotation of Three-Dimensional Objects , 1971, Science.

[3]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[4]  M. Corballis,et al.  Decisions about identity and orientation of rotated letters and digits , 1978, Memory & cognition.

[5]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[6]  D Marr,et al.  Theory of edge detection , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[7]  G. V. Van Hoesen,et al.  Prosopagnosia , 1982, Neurology.

[8]  S. Carey,et al.  Why faces are and are not special: an effect of expertise. , 1986 .

[9]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[10]  Y. Miyashita Neuronal correlate of visual associative long-term memory in the primate temporal cortex , 1988, Nature.

[11]  G. Rhodes,et al.  Expertise and configural coding in face recognition. , 1989, British journal of psychology.

[12]  M. Tarr,et al.  Mental rotation and orientation-dependence in shape recognition , 1989, Cognitive Psychology.

[13]  Michael J. Tarr,et al.  Orientation dependence in three-dimensional object recognition , 1989 .

[14]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[15]  Pierre Jolicoeur,et al.  Identification of Disoriented Objects: A Dual‐systems Theory , 1990 .

[16]  Mark H. Johnson,et al.  Biology and Cognitive Development: The Case of Face Recognition , 1993 .

[17]  J. Tanaka,et al.  Object categories and expertise: Is the basic level in the eye of the beholder? , 1991, Cognitive Psychology.

[18]  J. Sergent,et al.  Functional neuroanatomy of face and object processing. A positron emission tomography study. , 1992, Brain : a journal of neurology.

[19]  H H Bülthoff,et al.  Psychophysical support for a two-dimensional view interpolation theory of object recognition. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[20]  I. Biederman,et al.  Recognizing depth-rotated objects: evidence and conditions for three-dimensional viewpoint invariance. , 1993, Journal of experimental psychology. Human perception and performance.

[21]  M. Farah,et al.  Parts and Wholes in Face Recognition , 1993, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[22]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[23]  P. D. Eimas,et al.  Studies on the formation of perceptually based basic-level categories in young infants. , 1994, Child development.

[24]  B. Gibson,et al.  Does orientation-independent object recognition precede orientation-dependent recognition? Evidence from a cuing paradigm. , 1994, Journal of experimental psychology. Human perception and performance.

[25]  Isabel Gauthier,et al.  Geon recognition is viewpoint dependent , 1994 .

[26]  M J Tarr,et al.  Is human object recognition better described by geon structural descriptions or by multiple views? Comment on Biederman and Gerhardstein (1993). , 1995, Journal of experimental psychology. Human perception and performance.

[27]  M. Tarr,et al.  To What Extent Do Unique Parts Influence Recognition Across Changes in Viewpoint? , 1995 .

[28]  Martha J. Farah,et al.  Face perception and within-category discrimination in prosopagnosia , 1995, Neuropsychologia.

[29]  T. Allison,et al.  Face-sensitive regions in human extrastriate cortex studied by functional MRI. , 1995, Journal of neurophysiology.

[30]  D. Plaut Double dissociation without modularity: evidence from connectionist neuropsychology. , 1995, Journal of clinical and experimental neuropsychology.

[31]  M. Tarr Rotating objects to recognize them: A case study on the role of viewpoint dependency in the recognition of three-dimensional objects , 1995, Psychonomic bulletin & review.

[32]  S. Ullman,et al.  Generalization to Novel Images in Upright and Inverted Faces , 1993, Perception.

[33]  Josh H. McDermott,et al.  Functional imaging of human visual recognition. , 1996, Brain research. Cognitive brain research.

[34]  Tomaso Poggio,et al.  Image Representations for Visual Learning , 1996, Science.

[35]  M. Tarr,et al.  Orientation Priming of Novel Shapes in the Context of Viewpoint-Dependent Recognition , 1997, Perception.

[36]  Roland Baddeley,et al.  Optimal, Unsupervised Learning in Invariant Object Recognition , 1997, Neural Computation.

[37]  Michael J. Tarr,et al.  Representation of three-dimensional object similarity in human vision , 1997, Electronic Imaging.

[38]  G. Winocur,et al.  What Is Special about Face Recognition? Nineteen Experiments on a Person with Visual Object Agnosia and Dyslexia but Normal Face Recognition , 1997, Journal of Cognitive Neuroscience.

[39]  E. Rolls,et al.  INVARIANT FACE AND OBJECT RECOGNITION IN THE VISUAL SYSTEM , 1997, Progress in Neurobiology.

[40]  M. Tarr,et al.  Levels of categorization in visual recognition studied using functional magnetic resonance imaging , 1997, Current Biology.

[41]  M. Tarr,et al.  Becoming a “Greeble” Expert: Exploring Mechanisms for Face Recognition , 1997, Vision Research.

[42]  M. Tarr,et al.  Testing conditions for viewpoint invariance in object recognition. , 1997, Journal of experimental psychology. Human perception and performance.

[43]  James W. Tanaka,et al.  Expertise in object and face recognition , 1997 .

[44]  N. Kanwisher,et al.  The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception , 1997, The Journal of Neuroscience.

[45]  D. Perrett,et al.  Evidence accumulation in cell populations responsive to faces: an account of generalisation of recognition without mental transformations , 1998, Cognition.

[46]  M. Tarr,et al.  Training ‘greeble’ experts: a framework for studying expert object recognition processes , 1998, Vision Research.

[47]  Isabel Gauthier,et al.  Three-dimensional object recognition is viewpoint dependent , 1998, Nature Neuroscience.

[48]  W. Hayward Effects of outline shape in object recognition , 1998 .

[49]  Heinrich H Bülthoff,et al.  Image-based object recognition in man, monkey and machine , 1998, Cognition.

[50]  M. Tarr,et al.  Do viewpoint-dependent mechanisms generalize across members of a class? , 1998, Cognition.

[51]  Philippe G Schyns,et al.  Diagnostic recognition: task constraints, object information, and their interactions , 1998, Cognition.

[52]  M. Tarr News On Views: Pandemonium Revisited , 1999, Nature Neuroscience.

[53]  M. Tarr,et al.  Can Face Recognition Really be Dissociated from Object Recognition? , 1999, Journal of Cognitive Neuroscience.

[54]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[55]  M. Tarr,et al.  Activation of the middle fusiform 'face area' increases with expertise in recognizing novel objects , 1999, Nature Neuroscience.

[56]  David G. Lowe,et al.  Towards a Computational Model for Object Recognition in IT Cortex , 2000, Biologically Motivated Computer Vision.

[57]  M. Tarr,et al.  DOES VISUAL SUBORDINATE-LEVEL CATEGORISATION ENGAGE THE FUNCTIONALLY DEFINED FUSIFORM FACE AREA? , 2000, Cognitive neuropsychology.

[58]  M. Tarr,et al.  The Fusiform Face Area is Part of a Network that Processes Faces at the Individual Level , 2000, Journal of Cognitive Neuroscience.

[59]  M. Tarr,et al.  FFA: a flexible fusiform area for subordinate-level visual processing automatized by expertise , 2000, Nature Neuroscience.

[60]  I. Gauthier,et al.  Expertise for cars and birds recruits brain areas involved in face recognition , 2000, Nature Neuroscience.

[61]  Kunihiko Fukushima,et al.  Active and Adaptive Vision: Neural Network Models , 2000, Biologically Motivated Computer Vision.

[62]  N. Kanwisher Domain specificity in face perception , 2000, Nature Neuroscience.

[63]  Shimon Ullman,et al.  Object Classification Using a Fragment-Based Representation , 2000, Biologically Motivated Computer Vision.