Pose Refinement with Joint Optimization of Visual Points and Lines

High-precision camera re-localization technology in a pre-established 3D environment map is the basis for many tasks, such as Augmented Reality, Robotics and Autonomous Driving. The point-based visual re-localization approaches are well-developed in recent decades, but are insufficient in some feature-less cases. In this paper, we propose a point-line joint optimization method for pose refinement with the help of the innovatively designed line extracting CNN named VLSE, and the line matching and pose optimization approach. We adopt a novel line representation and customize a hybrid convolutional block based on the Stacked Hourglass network, to detect accurate and stable line features on images. Then we apply a coarse-to-fine strategy to obtain precise 2D-3D line correspondences based on the geometric constraint. A following point-line joint cost function is constructed to optimize the camera pose with the initial coarse pose. Sufficient experiments are conducted on open datasets, i.e, line extractor on Wireframe and YorkUrban, localization performance on Aachen Day-Night v1.1 and InLoc, to confirm the effectiveness of our point-line joint pose optimization method.

[1]  Cordelia Schmid,et al.  The Geometry and Matching of Lines and Curves Over Multiple Views , 2000, International Journal of Computer Vision.

[2]  Guang Jiang,et al.  Robust Line Segments Matching via Graph Convolution Networks , 2020, ArXiv.

[3]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[4]  Francesc Moreno-Noguer,et al.  PL-SLAM: Real-time monocular visual SLAM with points and lines , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[6]  Kun Huang,et al.  Learning to Parse Wireframes in Images of Man-Made Environments , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Il Hong Suh,et al.  Outdoor place recognition in urban environments using straight lines , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Hujun Bao,et al.  LoFTR: Detector-Free Local Feature Matching with Transformers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Tomasz Malisiewicz,et al.  SuperGlue: Learning Feature Matching With Graph Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Lipu Zhou,et al.  A Fast and Accurate Solution for Pose Estimation from 3D Correspondences , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Gui-Song Xia,et al.  Learning Attraction Field Representation for Robust Line Segment Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yuchen Yang,et al.  Retrieval and Localization with Observation Constraints , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Ian D. Reid,et al.  Real-Time Monocular SLAM with Straight Lines , 2006, BMVC.

[15]  Ji Zhao,et al.  PL-VIO: Tightly-Coupled Monocular Visual–Inertial Odometry Using Point and Line Features , 2018, Sensors.

[16]  Richard I. Hartley,et al.  A linear method for reconstruction from lines and points , 1995, Proceedings of IEEE International Conference on Computer Vision.

[17]  Jianliang Tang,et al.  Complete Solution Classification for the Perspective-Three-Point Problem , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[19]  Reinhard Koch,et al.  An efficient and robust line segment matching approach based on LBD descriptor and pairwise geometric consistency , 2013, J. Vis. Commun. Image Represent..

[20]  Javier Gonzalez-Jimenez,et al.  PL-SLAM: A Stereo SLAM System Through the Combination of Points and Line Segments , 2017, IEEE Transactions on Robotics.

[21]  Gabriela Csurka,et al.  Robust Image Retrieval-based Visual Localization using Kapture , 2020, ArXiv.

[22]  Ning Ding,et al.  TP-LSD: Tri-Points Based Line Segment Detector , 2020, ECCV.

[23]  Pascal Monasse,et al.  OpenMVG: Open Multiple View Geometry , 2016, RRPR@ICPR.

[24]  James H. Elder,et al.  Efficient Edge-Based Methods for Estimating Manhattan Frames in Urban Imagery , 2008, ECCV.

[25]  Marc Pollefeys,et al.  SOLD2: Self-supervised Occlusion-aware Line Description and Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Geonmo Gu,et al.  Towards Real-time and Light-weight Line Segment Detection , 2021, ArXiv.

[27]  Horst Bischof,et al.  Efficient 3D scene abstraction using line segments , 2017, Comput. Vis. Image Underst..

[28]  Zhuowen Tu,et al.  Line Segment Detection Using Transformers without Edges , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Gui-Song Xia,et al.  Holistically-Attracted Wireframe Parsing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yi Ma,et al.  End-to-End Wireframe Parsing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Cuneyt Akinlar,et al.  EDLines: A real-time line segment detector with a false detection control , 2011, Pattern Recognit. Lett..

[33]  Torsten Sattler,et al.  Patch2Pix: Epipolar-Guided Pixel-Level Correspondences , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Xiaoyuan Ma,et al.  Real-Time Monocular Visual SLAM by Combining Points and Lines , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[35]  David W. Murray,et al.  Video-rate localization in multiple maps for wearable augmented reality , 2008, 2008 12th IEEE International Symposium on Wearable Computers.

[36]  Josef Sivic,et al.  Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions , 2020, ECCV.

[37]  Gabriela Csurka,et al.  R2D2: Repeatable and Reliable Detector and Descriptor , 2019, ArXiv.

[38]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Haotian Zhang,et al.  ELSD: Efficient Line Segment Detector and Descriptor , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Shenghua Gao,et al.  PPGNet: Learning Point-Pair Graph for Line Segment Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Rafael Grompone von Gioi,et al.  LSD: A Fast Line Segment Detector with a False Detection Control , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Yi Ma,et al.  Fully Convolutional Line Parsing , 2021, ArXiv.