Low-level global features for vision-based localizations

Vision-based self-localization is the ability to derive one's own location from visual input only without knowledge of a previous position or idiothetic information. It is often assumed that the visual mechanisms and invariance properties used for object recognition will also be helpful for localization. Here we show that this is neither logi- cally reasonable nor empirically supported. We argue that the desirable invariance and generalization properties differ substantially between the two tasks. Application of several biologically inspired algorithms to var- ious test sets reveals that simple, globally pooled features outperform the complex vision models used for object recognition, if tested on lo- calization. Such basic global image statistics should thus be considered as valuable priors for self-localization, both in vision research and robot applications.

[1]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Tomaso Poggio,et al.  CNS: a GPU-based framework for simulating cortically-organized networks , 2010 .

[3]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[4]  James J. Little,et al.  Vision-based SLAM using the Rao-Blackwellised Particle Filter , 2005 .

[5]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[6]  Sven Eberhardt,et al.  From Pattern Recognition to Place Identication , 2012 .

[7]  David Filliat,et al.  Map-based navigation in mobile robots: II. A review of map-learning and path-planning strategies , 2003, Cognitive Systems Research.

[8]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[9]  B. McNaughton,et al.  Spatial information content and reliability of hippocampal CA1 neurons: Effects of visual input , 1994, Hippocampus.

[10]  Thomas Reineking,et al.  From visual perception to place , 2009, Cognitive Processing.

[11]  Cordelia Schmid,et al.  Dataset Issues in Object Recognition , 2006, Toward Category-Level Object Recognition.

[12]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[13]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[14]  Nicolas Pinto,et al.  Comparing state-of-the-art visual features on invariant object recognition tasks , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[15]  Jitendra Malik,et al.  When is scene identification just texture recognition? , 2004, Vision Research.

[16]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  D. H. Warren,et al.  Perception of Map-Environment Correspondence: The Roles of Features and Alignment , 1990 .

[18]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.