Visual Simultaneous Localization And Mapping With Stereo And Wide-angle Imagery

Mobile robots provide automated solutions for a range of tasks in industrial settings including but not limited to inspection. Our interest is automated inspection tasks including gas leakage detection in natural gas processing facilities such as those in Qatar. Using autonomous mobile robot solutions remove humans from potentially hazardous environments, eliminate potential human errors from fatigue, and provide data logging solutions for visualization and off-line post-processing. A core requirement for a mobile robot to perform any meaningful inspection task is to localize itself within the operating environment. We are developing a visual Simultaneous Localization And Mapping (SLAM) system for this purpose. Visual SLAM systems enable a robot to localize within an environment while simultaneously building a metric 3D map using only imagery from an on-board camera head. Vision has many advantages over alternate sensors used for localization and mapping. It requires minimal power compared to Lidar sensors, is relatively inexpensive compared to Inertial Navigation Systems (INS), and can operate in GPS denied environments. There is extensive work related to visual SLAM with most systems using either a perspective stereo camera head or a wide-angle of view monocular camera. Stereo cameras enable Euclidean 3D reconstruction from a single stereo pair and provide metric pose estimates. However, the narrow angle of view can limit pose estimation accuracy as visual features can typically be 'tracked' only across a small number of frames. Moreover, the limited angle of view presents challenges for place recognition whereby previously visited locations can be detected and loop closure performed to correct for long-range integrated position estimate inaccuracies. In contrast, wide-angle of view monocular cameras (e.g. fisheye and catadioptric) trade spatial resolution for an increased angle of view. This increased angle can enables visual scene points to be tracked over many frames and can improve rotational pose estimates. The increased angle of view can also improve visual place recognition performance as the same areas of a scene can be imaged under much larger changes in position and orientation. The primary disadvantage of a monocular wide-angle visual SLAM system is a scale ambiguity in the translational component of pose/position estimates. The visual SLAM system being developed in this work uses a combined stereo and wide-angle fisheye camera system with the aim of exploiting the advantages of each. For this we have combined visual feature tracks from both the stereo and fisheye camera within a single non-linear least-squares Sparse Bundle Adjustment (SBA) framework for localization. Initial experiments using large scale image datasets (approximately 10 kilometers in length) collected within Education City have been used to evaluate improvements in localization accuracy using the combined system. Additionally, we have demonstrated performance improvements in visual place recognition using our existing Hidden Markov Model (HMM) based place recognition algorithm.