Object detection through search with a foveated visual system

We present a foveated object detector (FOD) as a biologically-inspired alternative to the sliding window (SW) approach which is the dominant method of search in computer vision object detection. Similar to the human visual system, the FOD has higher resolution at the fovea and lower resolution at the visual periphery. Consequently, more computational resources are allocated at the fovea and relatively fewer at the periphery. The FOD processes the entire scene, uses retino-specific object detection classifiers to guide eye movements, aligns its fovea with regions of interest in the input image and integrates observations across multiple fixations. Our approach combines modern object detectors from computer vision with a recent model of peripheral pooling regions found at the V1 layer of the human visual system. We assessed various eye movement strategies on the PASCAL VOC 2007 dataset and show that the FOD performs on par with the SW detector while bringing significant computational cost savings.

[1]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[2]  G. Zelinsky A theory of eye movements during target acquisition. , 2008, Psychological review.

[3]  J Rovamo,et al.  Temporal Integration and Contrast Sensitivity in Foveal and Peripheral Vision , 1984, Perception.

[4]  Eero P. Simoncelli,et al.  Metamers of the ventral stream , 2011, Nature Neuroscience.

[5]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Miguel P. Eckstein,et al.  Evolution and Optimality of Similar Neural Mechanisms for Perception and Action during Search , 2010, PLoS Comput. Biol..

[7]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[8]  Jitendra Malik,et al.  An Information Maximization Model of Eye Movements , 2004, NIPS.

[9]  Geoffrey E. Hinton,et al.  Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[10]  I. Rentschler,et al.  Peripheral vision and pattern recognition: a review. , 2011, Journal of vision.

[11]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[12]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[13]  Wilson S. Geisler,et al.  Optimal eye movement strategies in visual search , 2005, Nature.

[14]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Christoph H. Lampert,et al.  Efficient Subwindow Search: A Branch and Bound Framework for Object Localization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Peter Kontschieder,et al.  Context-Sensitive Decision Forests for Object Detection , 2012, NIPS.

[17]  Wilson S. Geisler,et al.  Simple summation rule for optimal fixation selection in visual search , 2009, Vision Research.

[18]  Yee Whye Teh,et al.  Searching for objects driven by context , 2012, NIPS.

[19]  Trevor Darrell,et al.  Sparselet Models for Efficient Multiclass Object Detection , 2012, ECCV.

[20]  Daphne Koller,et al.  Discriminative learning of relaxed hierarchy for large-scale visual recognition , 2011, 2011 International Conference on Computer Vision.

[21]  Nando de Freitas,et al.  Learning attentional policies for tracking and recognition in video with deep networks , 2011, ICML.

[22]  William T. Freeman,et al.  Latent hierarchical structural learning for object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[25]  James H. Elder,et al.  Pre-Attentive and Attentive Detection of Humans in Wide-Field Scenes , 2007, International Journal of Computer Vision.

[26]  Christoph H. Lampert An efficient divide-and-conquer cascade for nonlinear object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Deva Ramanan,et al.  Histograms of Sparse Codes for Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Miguel P Eckstein,et al.  Attentional Cues in Real Scenes, Saccadic Targeting, and Bayesian Priors , 2005, Psychological science.

[29]  Fei Guo,et al.  Neural Representations of Contextual Guidance in Visual Search of Real-World Scenes , 2013, The Journal of Neuroscience.

[30]  Luc Van Gool,et al.  Scalable multi-class object detection , 2011, CVPR 2011.

[31]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Gregory J. Zelinsky,et al.  Scene context guides eye movements during visual search , 2006, Vision Research.

[33]  Javier R. Movellan,et al.  Infomax Control of Eye Movements , 2010, IEEE Transactions on Autonomous Mental Development.

[34]  Miguel P Eckstein,et al.  Object co-occurrence serves as a contextual cue to guide and facilitate visual search in a natural viewing environment. , 2011, Journal of vision.

[35]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[36]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[37]  Charless C. Fowlkes,et al.  Do We Need More Training Data or Better Models for Object Detection? , 2012, BMVC.

[38]  Jitendra Malik,et al.  Discriminative Decorrelation for Clustering and Classification , 2012, ECCV.

[39]  Jordi Gonzàlez,et al.  A coarse-to-fine approach for fast deformable object detection , 2011, CVPR 2011.

[40]  David A. Forsyth,et al.  Fast Template Evaluation with Vector Quantization , 2013, NIPS.

[41]  Preeti Verghese,et al.  Active search for multiple targets is inefficient , 2010, Vision Research.

[42]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[43]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[44]  Wei Zhang,et al.  A Computational Model of Eye Movements during Object Class Detection , 2005, NIPS.

[45]  Alexei A. Efros,et al.  How Important Are "Deformable Parts" in the Deformable Parts Model? , 2012, ECCV Workshops.

[46]  Iasonas Kokkinos Bounding Part Scores for Rapid Detection with Deformable Part Models , 2012, ECCV Workshops.

[47]  Miguel P Eckstein,et al.  Saccadic and perceptual performance in visual search tasks. I. Contrast detection and discrimination. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[48]  Iasonas Kokkinos,et al.  Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound , 2011, NIPS.

[49]  Laurence T. Maloney,et al.  Human Visual Search Does Not Maximize the Post-Saccadic Probability of Identifying Targets , 2012, PLoS Comput. Biol..

[50]  Richard F Murray,et al.  Saccadic and perceptual performance in visual search tasks. II. Letter discrimination. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[51]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[52]  Miguel P Eckstein,et al.  Visual search: a retrospective. , 2011, Journal of vision.

[53]  S. Klein,et al.  Vernier acuity, crowding and cortical magnification , 1985, Vision Research.