NIMBLE: a kernel density model of saccade-based visual memory.

We present a Bayesian version of J. Lacroix, J. Murre, and E. Postma's (2006) Natural Input Memory (NIM) model of saccadic visual memory. Our model, which we call NIMBLE (NIM with Bayesian Likelihood Estimation), uses a cognitively plausible image sampling technique that provides a foveated representation of image patches. We conceive of these memorized image fragments as samples from image class distributions and model the memory of these fragments using kernel density estimation. Using these models, we derive class-conditional probabilities of new image fragments and combine individual fragment probabilities to classify images. Our Bayesian formulation of the model extends easily to handle multi-class problems. We validate our model by demonstrating human levels of performance on a face recognition memory task and high accuracy on multi-category face and object identification. We also use NIMBLE to examine the change in beliefs as more fixations are taken from an image. Using fixation data collected from human subjects, we directly compare the performance of NIMBLE's memory component to human performance, demonstrating that using human fixation locations allows NIMBLE to recognize familiar faces with only a single fixation.

[1]  Preeti Verghese,et al.  Where to look next? Eye movements reduce local uncertainty. , 2007, Journal of vision.

[2]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[3]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[4]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[5]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[7]  I. Gauthier,et al.  Visual object understanding , 2004, Nature Reviews Neuroscience.

[8]  Garrison W. Cottrell,et al.  Facial Memory Is Kernel Density Estimation (Almost) , 1998, NIPS.

[9]  Eric O. Postma,et al.  The Natural Input Memory Model , 2005 .

[10]  J. Henderson Human gaze control during real-world scene perception , 2003, Trends in Cognitive Sciences.

[11]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[12]  Harry Wechsler,et al.  The FERET database and evaluation procedure for face-recognition algorithms , 1998, Image Vis. Comput..

[13]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[14]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[15]  Carrick C. Williams,et al.  Eye movements are functional during face learning , 2005, Memory & cognition.

[16]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[17]  G. Cottrell,et al.  Two Fixations Suffice in Face Recognition , 2008, Psychological science.

[18]  Alan C Bovik,et al.  Contrast statistics for foveated visual systems: fixation selection by minimizing contrast entropy. , 2005, Journal of the Optical Society of America. A, Optics, image science, and vision.

[19]  Eric O. Postma,et al.  Modeling Recognition Memory Using the Similarity Structure of Natural Input , 2006, Cogn. Sci..

[20]  A. L. I︠A︡rbus Eye Movements and Vision , 1967 .

[21]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[22]  T. Foulsham,et al.  What can saliency models predict about eye movements? Spatial and sequential aspects of fixations during encoding and recognition. , 2008, Journal of vision.

[23]  Garrison W. Cottrell,et al.  A probabilistic model of eye movements in concept formation , 2007, Neurocomputing.

[24]  Garrison W. Cottrell,et al.  A model of scan paths applied to face recognition , 2004 .

[25]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[27]  Jitendra Malik,et al.  An Information Maximization Model of Eye Movements , 2004, NIPS.

[28]  Brad Duchaine,et al.  Dissociations of Face and Object Recognition in Developmental Prosopagnosia , 2005, Journal of Cognitive Neuroscience.

[29]  Wei Zhang,et al.  The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search , 2005, NIPS.

[30]  G. Cottrell,et al.  EMPATH: A Neural Network that Categorizes Facial Expressions , 2002, Journal of Cognitive Neuroscience.

[31]  Thomas J. Palmeri,et al.  An Exemplar-Based Random Walk Model of Speeded Classification , 1997 .

[32]  Richard M. Shiffrin,et al.  Models of Memory , 2002 .

[33]  L. Stark,et al.  Scanpaths in saccadic eye movements while viewing and recognizing patterns. , 1971, Vision research.

[34]  Michael C. Mozer,et al.  Top-Down Control of Visual Attention: A Rational Account , 2005, NIPS.

[35]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[36]  R. Nosofsky,et al.  An exemplar-based random walk model of speeded classification. , 1997, Psychological review.

[37]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[38]  Eric O. Postma,et al.  Modeling Visual Classification using Bottom-up and Top-down Fixation Selection , 2007 .

[39]  Douglas L. Hintzman,et al.  MINERVA 2: A simulation model of human memory , 1984 .