Robust Scene Stitching in Large Scale Mobile Mapping

This paper presents a solution for the loop closure problem in an image-based mobile mapping context. A van equipped with stereo cameras collects recordings in an urban environment, simultaneously monitoring GPS information. Using Structure-from-Motion, the position of the van and the surroundings are retrieved. The determination of the translation and orientation of the van’s position is recursive: a slight drift can gradually build up to flawed localizations. One can rely on the GPS information to perform adjustments, but its accuracy is not adequate to yield a model with high precision. Yet, visual loop-closing – recognizing that a location is revisited – may help mitigate the issue. The current system does not take into account possible reoccurrences of identical features in distant recordings. This paper adds such loop closure. Local feature matching in two stages detects when a particular site is revisited, in order to enforce correspondences between images, that may have been taken with large time lapses in between. Our system relies on GPS but does not use odometric information. We extend the original image-to-image matching approach to a pose-to-pose matching approach, combining several images and achieving robust scene matching results. Parameter optimization is followed by extensive experiments. Our pipeline, which facilitates parallel execution, reaches matching rates higher than those reported for typical state-of-the-art algorithms. We also demonstrate robustness to odometric inconsistencies resulting from poor prior model build-up. Loop closure is crucial for high-accuracy models. The current stateof-the-art in topological mapping are FABMAP [3] and CAT-SLAM [4], but limit themselves to a binary decision, i.e. whether or not the location was visited already. In the envisioned application however, it is desirable to have actual image point correspondences to facilitate bundle adjustment. An approach closer to this goal, by directly attempting to match local features among images, is described in [5]. The utilized epipolar constraint to cope with false positives is, however, not error-free. The devised approach method implements more robust error dismissal. The approach consists of two major steps. First, the issue of detecting revisited sites over time is encountered by clustering GPS information and taking its inaccuracy into account. Next, in every route of such a cluster, two poses are selected that are expected to contain common elements, based on a Naive Bayesian matching framework with severely downscaled images. When a so-called cross-route pose pair is obtained, re-occurrences of the same physical points are tracked in the associated images. This problem is treated in two steps: single pose matching and cross-route image matching. The former finds matches and deducts corresponding 3D points using the SURF [1] detector and descriptor among views taken from the same van position. Since camera calibration is available, an epipolar consistency check is straightforward. Using the surviving matches, for every van pose a point cloud results. The latter step of cross-route image matching attempts to match the images from different van poses again using SURF. This practice establishes a link between the two earlier constructed point clouds. PROSAC [2], a prioritized RANSAC algorithm, is applied to robustly calculate the transformation between the two point clouds in a time-optimal way while pruning out false positives. Figure 1 provides an illustration. This paper has two main contributions. First, our novel loop-closure technique does not depend on single image pairs for correspondences. Instead, a cloud is constructed around two van poses and these clouds are fitted together; an image-to-image approach is extended to a pose-to-pose approach. Second, our method is able to detect matches in challenging wide baseline conditions, were other systems tend to fail. Since it concerns a system-specific application, a specialized dataset is devised that comprises a substantial amount of images from an urban environment. Separate datasets were used for parameter tuning and subFigure 1: Illustration of cross-route image matches after pruning of false positives by means of PROSAC. Note that these are not the only correspondences found; also for other image combinations matches are tracked.

[1]  Nicholas Roy,et al.  Towards Persistent Localization and Mapping with a Continuous Appearance-Based Topology , 2013 .

[2]  Naser El-Sheimy,et al.  Mobile Mapping Systems – State of the Art and Future Trends , 2004 .

[3]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Jing Huang,et al.  Point cloud matching based on 3D self-similarity , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[5]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[6]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[7]  Gordon Wyeth,et al.  CAT-SLAM: probabilistic localisation and mapping using a continuous appearance-based trajectory , 2012, Int. J. Robotics Res..

[8]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[9]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[10]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[11]  Gordon Wyeth,et al.  OpenFABMAP: An open source toolbox for appearance-based loop closure detection , 2012, 2012 IEEE International Conference on Robotics and Automation.

[12]  Paul Newman,et al.  FAB-MAP 3D: Topological mapping with spatial and visual appearance , 2010, 2010 IEEE International Conference on Robotics and Automation.

[13]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[14]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[16]  James J. Little,et al.  Mobile Robot Localization and Mapping with Uncertainty using Scale-Invariant Visual Landmarks , 2002, Int. J. Robotics Res..

[17]  Paul Newman,et al.  Accelerated appearance-only SLAM , 2008, 2008 IEEE International Conference on Robotics and Automation.

[18]  Niko Sünderhauf,et al.  BRIEF-Gist - closing the loop by simple means , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Luc Van Gool,et al.  Naive Bayes Image Classification: Beyond Nearest Neighbors , 2012, ACCV.

[20]  Richard Hartley,et al.  Visual localization and loopback detection with a high resolution omnidirectional camera , 2005 .

[21]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[22]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.