Contextual visual localization: cascaded submap classification, optimized saliency detection, and fast view matching

In this paper, we present a novel coarse-to-fine visual localization approach: contextual visual localization. This approach relies on three elements: (i) a minimal-complexity classifier for performing fast coarse localization (submap classification); (ii) an optimized saliency detector which exploits the visual statistics of the submap; and (iii) a fast view-matching algorithm which filters initial matchings with a structural criterion. The latter algorithm yields fine localization. Our experiments show that these elements have been successfully integrated for solving the global localization problem. Context, that is, the awareness of being in a particular submap, is defined by a supervised classifier tuned for a minimal set of features. Visual context is exploited both for tuning (optimizing) the saliency detection process, and to select potential matching views in the visual database, close enough to the query view.

[1]  James J. Little,et al.  Global localization using distinctive visual features , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Paul Newman,et al.  Outdoor SLAM using visual appearance and laser ranging , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[3]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[4]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[5]  Juan Manuel Sáez,et al.  Entropy Minimization SLAM Using Stereo Vision , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[6]  Alan L. Yuille,et al.  Statistical Edge Detection: Learning and Evaluating Edge Cues , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Ben J. A. Kröse,et al.  Sparse appearance based modeling for robot localization , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Steven Gold,et al.  A Graduated Assignment Algorithm for Graph Matching , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Wolfram Burgard,et al.  Supervised Learning of Topological Maps using Semantic Information Extracted from Range Data , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[12]  Juan Manuel Sáez,et al.  6DOF entropy minimization SLAM , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[13]  Francisco Escolano,et al.  Protein classification by matching and clustering surface graphs , 2006, Pattern Recognit..

[14]  Hongbin Zha,et al.  Coarse-to-fine vision-based localization by indexing scale-Invariant features , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Jana Kosecka,et al.  Probabilistic location recognition using reduced feature set , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[16]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[17]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[18]  Andrew Hogue,et al.  Underwater 3D SLAM through entropy minimization , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[19]  Gustavo Carneiro,et al.  The distinctiveness, detectability, and robustness of local image features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Cordelia Schmid,et al.  Indexing Based on Scale Invariant Interest Points , 2001, ICCV.

[21]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[22]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[23]  Ali Shokoufandeh,et al.  Landmark Selection for Vision-Based Navigation , 2006, IEEE Trans. Robotics.

[24]  Miguel Cazorla,et al.  Towards autonomous adaptation in visual tasks , 2006, Workshop de Agentes Físicos.

[25]  Patrick Rives,et al.  Visual servoing over unknown, unstructured, large-scale scenes , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[26]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[27]  Alex Pentland,et al.  Visual Context Awareness via Wearable Computing , 1998 .

[28]  Jana Kosecka,et al.  Global localization and relative positioning based on scale-invariant keypoints , 2005, Robotics Auton. Syst..

[29]  Antonis A. Argyros,et al.  Robot Homing by Exploiting Panoramic Vision , 2005, Auton. Robots.

[30]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[32]  Gregory Dudek,et al.  Learning environmental features for pose estimation , 2001, Image Vis. Comput..

[33]  Paul Newman,et al.  SLAM-Loop Closing with Visually Salient Features , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.