A dual foveal-peripheral visual processing model implements efficient saccade selection

We develop a visuomotor model that implements visual search as a focal accuracy-seeking policy, with the target’s position and category drawn independently from a common generative process. Consistently with the anatomical separation between the ventral versus dorsal pathways, the model is composed of two pathways, that respectively infer what to see and where to look. The “What” network is a classical deep learning classifier, that only processes a small region around the center of fixation, providing a “foveal” accuracy. In contrast, the “Where” network processes the full visual field in a biomimetic fashion, using a log-polar retinotopic encoding, which is preserved up to the action selection level. The foveal accuracy is used to train the “Where” network. After training, the “Where” network provides an “accuracy map” that serves to guide the eye toward peripheral objects. The comparison of both networks accuracies amounts to either select a saccade or to keep the eye at the center to identify the target. We test this setup on a simple task of finding a digit in a large, cluttered image. Our simulation results demonstrate the effectiveness of this approach, increasing by one order of magnitude the radius of the visual field toward which the agent can detect and recognize a target, either through a single saccade or with multiple ones. Importantly, our log-polar treatment of the visual information exploits the strong compression rate performed at the sensory level, providing ways to implement visual search in a sub-linear fashion, in contrast with mainstream computer vision.

[1]  Leon A. Gatys,et al.  Understanding Low- and High-Level Contributions to Fixation Prediction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[3]  D. V. van Essen,et al.  The representation of the visual field in parvicellular and magnocellular layers of the lateral geniculate nucleus in the macaque monkey , 1984, The Journal of comparative neurology.

[4]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Guillaume S Masson,et al.  Motion clouds: model-based stimulus synthesis of natural-like random textures for the study of motion perception. , 2012, Journal of neurophysiology.

[6]  Ming Du,et al.  Computer vision algorithms and hardware implementations: A survey , 2019, Integr..

[7]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[8]  Misha Denil,et al.  Learning Where to Attend with Deep Architectures for Image Tracking , 2011, Neural Computation.

[9]  I. Rentschler,et al.  Peripheral vision and pattern recognition: a review. , 2011, Journal of vision.

[10]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[11]  L. Stark,et al.  The main sequence, a tool for studying human eye movements , 1975 .

[12]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[13]  Karl R. Gegenfurtner,et al.  Effects of focal brain lesions on perception of different motion types , 2010 .

[14]  Charles E. Davis,et al.  Zapping 500 faces in less than 100 seconds: Evidence for extremely fast and sustained continuous visual search , 2018, Scientific Reports.

[15]  Wilson S. Geisler,et al.  Optimal eye movement strategies in visual search , 2005, Nature.

[16]  Wilson S. Geisler,et al.  Implementation of a foveated image coding system for image bandwidth reduction , 1996, Electronic Imaging.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Anton Kaplanyan,et al.  Deepfovea: neural reconstruction for foveated rendering and video compression using learned natural video statistics , 2019, SIGGRAPH Talks.

[19]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[20]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[21]  Miguel P. Eckstein,et al.  Object detection through search with a foveated visual system , 2014, PLoS Comput. Biol..

[22]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[23]  W. Geisler,et al.  Optimal Eye Movement Strategies in Visual Search ( Supplement ) , 2005 .

[24]  H. J. Gamble Trends in Neurosciences , 1980 .

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  T. Michael Knasel,et al.  Robotics and autonomous systems , 1988, Robotics Auton. Syst..

[27]  Zhengyou Zhang,et al.  Editorial Renewal for the IEEE Transactions on Autonomous Mental Development , 2014 .

[28]  Martin Kraus,et al.  Pyramid Methods in GPU-Based Image Processing , 2011 .

[29]  Leslie G. Ungerleider,et al.  Object vision and spatial vision: two cortical pathways , 1983, Trends in Neurosciences.

[30]  Gabriel Cristóbal,et al.  Self-Invertible 2D Log-Gabor Wavelets , 2007, International Journal of Computer Vision.

[31]  Karl J. Friston,et al.  Human visual exploration reduces uncertainty about the sensed world , 2018, PloS one.

[32]  Nicholas J Priebe,et al.  Natural image and receptive field statistics predict saccade sizes , 2018, Nature Neuroscience.

[33]  Simon J. Thorpe,et al.  Ultra-rapid object detection with saccadic eye movements: Visual processing speed revisited , 2006, Vision Research.

[34]  Alex O. Holcombe,et al.  A temporal limit on judgments of the position of a moving object , 2010 .

[35]  Javier R. Movellan,et al.  Infomax Control of Eye Movements , 2010, IEEE Transactions on Autonomous Mental Development.

[36]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Matthias Bethge,et al.  DeepGaze II: Reading fixations from deep features trained on object recognition , 2016, ArXiv.

[38]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[40]  The Journal of Comparative Neurology , 1899, The American Naturalist.

[41]  Emmanuel Daucé,et al.  Active Fovea-Based Vision Through Computationally-Effective Model-Based Prediction , 2018, Front. Neurorobot..

[42]  Y. Niv,et al.  Silencing the Critics: Understanding the Effects of Cocaine Sensitization on Dorsolateral and Ventral Striatum in the Context of an Actor/Critic Model , 2008, Front. Neurosci..

[43]  J. Wolfe,et al.  When does repeated search in scenes involve memory? Looking at versus looking for objects in scenes. , 2012, Journal of experimental psychology. Human perception and performance.

[44]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[46]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[47]  Alexandre Bernardino,et al.  A review of log-polar imaging for visual perception in robotics , 2010, Robotics and Autonomous Systems.

[48]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[49]  D. Sparks,et al.  Sensory and motor maps in the mammalian superior colliculus , 1987, Trends in Neurosciences.

[50]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[51]  Philipp Koehn,et al.  Cognitive Psychology , 1992, Ageing and Society.

[52]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[53]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Karl J. Friston,et al.  Perceptions as Hypotheses: Saccades as Experiments , 2012, Front. Psychology.

[56]  Patrick Cavanagh,et al.  Remapped visual masking. , 2011, Journal of vision.

[57]  Mikhail Okunev,et al.  DeepFovea , 2019, ACM Trans. Graph..