Absolute Spatial Context-aware visual feature descriptors for outdoor handheld camera localization overcoming visual repetitiveness in urban environments

We present a framework that enables 6DoF camera localization in outdoor environments by providing visual feature descriptors with an Absolute Spatial Context (ASPAC). These descriptors combine visual information from the image patch around a feature with spatial information, based on a model of the environment and the readings of sensors attached to the camera, such as GPS, accelerometers, and a digital compass. The result is a more distinct description of features in the camera image, which correspond to 3D points in the environment. This is particularly helpful in urban environments containing large amounts of repetitive visual features. Additionally, we describe the first comprehensive test database for outdoor handheld camera localization comprising of over 45,000 real camera images of an urban environment, captured under natural camera motions and different illumination settings. For all these images, the dataset not only contains readings of the sensors attached to the camera, but also ground truth information on the full 6DoF camera pose, and the geometry and texture of the environment. Based on this dataset, which we have made available to the public, we show that using our proposed framework provides both faster matching and better localization results compared to state-of-the-art methods.

[1]  Joachim Hertzberg,et al.  Ground truth evaluation of large urban 6D SLAM , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Luca Chittaro,et al.  Augmenting audio messages with visual directions in mobile guides: an evaluation of three approaches , 2005, Mobile HCI.

[3]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[4]  Gudrun Klinker,et al.  An outdoor ground truth evaluation dataset for sensor-aided visual handheld camera localization , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[5]  Marc Pollefeys,et al.  Leveraging 3D City Models for Rotation Invariant Place-of-Interest Recognition , 2011, International Journal of Computer Vision.

[6]  Jan-Michael Frahm,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[8]  Dieter Schmalstieg,et al.  Exploiting sensors on mobile phones to improve wide-area localization , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[9]  Selim Benhimane,et al.  Inertial sensor-aligned visual feature descriptors , 2011, CVPR 2011.

[10]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  George Loizou,et al.  Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[12]  Trevor Darrell,et al.  Size Matters: Metric Visual Search Constraints from Monocular Metadata , 2010, NIPS.

[13]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[14]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Tom Drummond,et al.  Initialisation for Visual Tracking in Urban Environments , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[16]  Nassir Navab,et al.  A dataset and evaluation methodology for template-based tracking algorithms , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[17]  Tobias Höllerer,et al.  Wide-area scene mapping for mobile visual tracking , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[18]  David W. Murray,et al.  Parallel Tracking and Mapping on a camera phone , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[19]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Selim Benhimane,et al.  Representative feature descriptor sets for robust handheld camera localization , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[21]  Charles V. Stewart,et al.  Physical Scale Keypoints: Matching and Registration for Combined Intensity/Range Images , 2011, International Journal of Computer Vision.