Visual attention and object categorization: from psychophysics to computational models

This thesis is arranged in two main parts. Each part relies on an approach using the methods of psychophysics and computational modeling to bring abstract or high-level theories of vision closer to a concrete neurobiological foundation. The first part addresses the topic of visual object categorization. Previous studies using high-level models categorization have left unresolved issues of neurobiological relevance, including how features are extracted from the image and the role played by memory capacity in categorization performance. We compared the ability of a comprehensive set of models to match the categorization performance of human observers while explicitly accounting for the models' numbers of free parameters. The most successful models did not require a large memory capacity, suggesting that a sparse, abstracted representation of category properties may underlie categorization performance. This type of representation—different from classical prototype abstraction—could also be extracted directly from two-dimensional images via a biologically plausible early vision model, rather than relying on experimenter-imposed features. The second part addresses visual attention in its bottom-up, stimulus-driven form. Previous research showed that a model of bottom-up visual attention can account in part for the spatial positions of locations fixated by humans while free-viewing complex natural and artificial scenes. We used a similar framework to quantify how the predictive ability of such a model may be enhanced by new model components based on several specific mechanisms within the functional architecture of the visual system. These components included richer interactions among orientation-tuned units, both at short-range (for clutter reduction) and at long-range (for contour facilitation). Subjects free-viewed naturalistic and artificial images while their eye movements were recorded. The resulting fixation locations were compared with the models' predicted salience maps. We found that each new model component was important in attaining a strong quantitative correspondence between model and behavior. Finally, we compared the model predictions with the spatial locations obtained from a task that relied on mouse clicking rather than eye tracking. As these models become more accurate in predicting behaviorally-relevant salient locations, they become useful to a range of applications in computer vision and human-machine interface design.

[1]  Pietro Perona,et al.  Is bottom-up attention useful for object recognition? , 2004, CVPR 2004.

[2]  W T Maddox,et al.  Comparing decision bound and exemplar models of categorization , 1993, Perception & psychophysics.

[3]  Katherine M. Armstrong,et al.  Visuomotor Origins of Covert Spatial Attention , 2003, Neuron.

[4]  Shimon Edelman,et al.  Representation of objective similarity among three-dimensional shapes in the monkey , 1998, Biological Cybernetics.

[5]  G. Rizzolatti,et al.  Orienting of attention and eye movements , 2004, Experimental Brain Research.

[6]  V. Sloutsky,et al.  How much does a shared name make things similar? Part 1. Linguistic labels and the development of similarity judgment. , 1999, Developmental psychology.

[7]  L. Itti,et al.  A neural model combining attentional orienting to object recognition: preliminary explorations on the interplay between where and what , 2001, 2001 Conference Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[8]  Zhaoping Li,et al.  A Neural Model of Contour Integration in the Primary Visual Cortex , 1998, Neural Computation.

[9]  M. Goodale,et al.  Visual pathways to perception and action. , 1993, Progress in brain research.

[10]  Jean Bennett,et al.  Lateral Connectivity and Contextual Interactions in Macaque Primary Visual Cortex , 2002, Neuron.

[11]  Christof Koch,et al.  Attentional Selection for Object Recognition - A Gentle Way , 2002, Biologically Motivated Computer Vision.

[12]  J. Meere The role of attention. , 2002 .

[13]  Stephen K. Reed,et al.  Pattern recognition and categorization , 1972 .

[14]  R. Nosofsky,et al.  Combining exemplar-based category representations and connectionist learning rules. , 1992, Journal of experimental psychology. Learning, memory, and cognition.

[15]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[16]  U Polat,et al.  Spatial interactions in human vision: from near to far via experience-dependent cascades of connections. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Zucchini,et al.  An Introduction to Model Selection. , 2000, Journal of mathematical psychology.

[18]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[19]  G. Rizzolatti,et al.  Spatial attention and eye movements , 2004, Experimental Brain Research.

[20]  G. Schneider Two visual systems: Brain mechanisms for localization and discrimination are dissociated by tectal a , 1969 .

[21]  D. Sagi,et al.  Isolating Excitatory and Inhibitory Nonlinear Spatial Interactions Involved in Contrast Detection * * Part of this paper was presented at the 17th ECVP conference, Eindhoven, The Netherlands (September 1994). , 1996, Vision Research.

[22]  L. Squire,et al.  Learning about categories in the absence of memory. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[23]  J. H. Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998 .

[24]  Leslie G. Ungerleider Two cortical visual systems , 1982 .

[25]  R. Vogels,et al.  Inferotemporal neurons represent low-dimensional configurations of parameterized shapes , 2001, Nature Neuroscience.

[26]  Y. Rosseel Connectionist models of categorization: A statistical interpretation. , 1996 .

[27]  U. Polat,et al.  Lateral interactions between spatial channels: Suppression and facilitation revealed by lateral masking experiments , 1993, Vision Research.

[28]  M. Tarr,et al.  FFA: a flexible fusiform area for subordinate-level visual processing automatized by expertise , 2000, Nature Neuroscience.

[29]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[30]  U. Polat,et al.  The architecture of perceptual spatial interactions , 1994, Vision Research.

[31]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[32]  G. Rizzolatti,et al.  Reorienting attention across the horizontal and vertical meridians: Evidence in favor of a premotor theory of attention , 1987, Neuropsychologia.

[33]  N. Sigala,et al.  Visual Categorization and Object Representation in Monkeys and Humans , 2002, Journal of Cognitive Neuroscience.

[34]  Leslie G. Ungerleider,et al.  Object vision and spatial vision: two cortical pathways , 1983, Trends in Neurosciences.

[35]  R. Nosofsky American Psychological Association, Inc. Choice, Similarity, and the Context Theory of Classification , 2022 .

[36]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[37]  B. Wandell Foundations of vision , 1995 .

[38]  R. Shepard,et al.  Toward a universal law of generalization for psychological science. , 1987, Science.

[39]  D. Gitelman,et al.  Covert Visual Spatial Orienting and Saccades: Overlapping Neural Systems , 2000, NeuroImage.

[40]  T Moore,et al.  Control of eye movements and spatial attention. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Robert M. Nosofsky,et al.  Selective attention and the formation of linear decision boundaries: Reply to Maddox and Ashby (1998). , 1998 .

[42]  R. Nosofsky,et al.  Selective attention and the formation of linear decision boundaries. , 1996, Journal of experimental psychology. Human perception and performance.

[43]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[44]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[45]  J. Tanaka,et al.  Object categories and expertise: Is the basic level in the eye of the beholder? , 1991, Cognitive Psychology.

[46]  S. Shimojo,et al.  Location vs Feature: Reaction Time Reveals Dissociation Between Two Visual Functions , 1996, Vision Research.

[47]  T. Moore,et al.  Microstimulation of the frontal eye field and its effects on covert spatial attention. , 2004, Journal of neurophysiology.

[48]  Christof Koch,et al.  Attentional effects on contrast detection in the presence of surround masks , 2000, Vision Research.

[49]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[50]  S. Treue Visual attention: the where, what, how and why of saliency , 2003, Current Opinion in Neurobiology.

[51]  R. Nosofsky Relations between exemplar-similarity and likelihood models of classification , 1990 .

[52]  R. Nosofsky Tests of an exemplar model for relating perceptual classification and recognition memory. , 1991, Journal of experimental psychology. Human perception and performance.

[53]  M. Pettet,et al.  Dynamic changes in receptive-field size in cat primary visual cortex. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[54]  N. Sigala,et al.  Visual categorization shapes feature selectivity in the primate temporal cortex , 2002, Nature.

[55]  R. Vogels Categorization of complex visual images by rhesus monkeys. Part 1: behavioural study , 1999, The European journal of neuroscience.

[56]  W T Maddox,et al.  On the dangers of averaging across observers when comparing decision bound models and generalized context models of categorization , 1999, Perception & psychophysics.

[57]  Laurent Itti,et al.  A Model of Contour Integration in Early Visual Cortex , 2002, Biologically Motivated Computer Vision.