Fast, yet robust end-to-end camera pose estimation for robotic applications

Camera pose estimation in robotic applications is paramount. Most of recent algorithms based on convolutional neural networks demonstrate that they are able to predict the camera pose adequately. However, they usually suffer from the computational complexity which prevent them from running in real-time. Additionally, they are not robust to perturbations such as partial occlusion while they have not been trained on such cases beforehand. To study these limitations, this paper presents a fast and robust end-to-end Siamese convolutional model for robot-camera pose estimation. Two colored-frames are fed to the model at the same time, and the generic features are produced mainly based on the transfer learning. The extracted features are then concatenated, from which the relative pose is directly obtained at the output. Furthermore, a new dataset is generated, which includes several videos taken at various situations for the model evaluation. The proposed technique shows a robust performance even in challenging scenes, which have not been rehearsed during the training phase. Through the experiments conducted with an eye-in-hand KUKA robotic arm, the presented network renders fairly accurate results on camera pose estimation despite scene-illumination changes. Also, the pose estimation is conducted with reasonable accuracy in presence of partial camera occlusion. The results are enhanced by defining a new dynamic weighted loss function. The proposed method is further exploited in visual servoing scenario.

[1]  G. Oriolo,et al.  Robotics: Modelling, Planning and Control , 2008 .

[2]  Luigi Villani,et al.  Visual servoing with safe interaction using image moments , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Angel Domingo Sappa,et al.  Deep Learning Based Camera Pose Estimation in Multi-view Environment , 2018, 2018 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS).

[5]  Radu Horaud,et al.  A Comprehensive Analysis of Deep Regression , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[7]  Esa Rahtu,et al.  Relative Camera Pose Estimation Using Convolutional Neural Networks , 2017, ACIVS.

[8]  Peter I. Corke,et al.  Training Deep Neural Networks for Visual Servoing , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Javier Ruiz-del-Solar,et al.  A Survey on Deep Learning Methods for Robot Vision , 2018, ArXiv.

[10]  Olaf Kähler,et al.  Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices , 2015, IEEE Transactions on Visualization and Computer Graphics.

[11]  Luigi di Stefano,et al.  Real-Time RGB-D Camera Pose Estimation in Novel Scenes Using a Relocalisation Cascade , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Genlin Ji,et al.  Application of Time-varying Acceleration Coefficients PSO to Face Pose Estimation , 2015 .

[13]  Nassir Navab,et al.  Joint motion boundary detection and CNN-based feature visualization for video object segmentation , 2019, Neural Computing and Applications.

[14]  S. Amirhassan Monadjemi,et al.  Co-segmentation via visualization , 2018, J. Vis. Commun. Image Represent..

[15]  Zhaoxiang Liu,et al.  Deep Global-Relative Networks for End-to-End 6-DoF Visual Localization and Odometry , 2018, PRICAI.

[16]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[17]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[18]  Esa Rahtu,et al.  Image-Based Localization Using Hourglass Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[19]  Roland Siegwart,et al.  From Coarse to Fine: Robust Hierarchical Localization at Large Scale , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Andrew Zisserman,et al.  Efficient Visual Search of Videos Cast as Text Retrieval , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Matthias Nießner,et al.  SemanticPaint , 2015, ACM Trans. Graph..

[24]  Eric Brachmann,et al.  Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Manlu Liu,et al.  Stereo Cameras Self-Calibration Based on SIFT , 2009, 2009 International Conference on Measuring Technology and Mechatronics Automation.

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  François Chollet,et al.  Deep Learning with Python , 2017 .

[29]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Thomas Brox,et al.  DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[34]  Wolfram Burgard,et al.  VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry , 2018, IEEE Robotics and Automation Letters.

[35]  Yi Li,et al.  DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[36]  Ben Glocker,et al.  Real-Time RGB-D Camera Relocalization via Randomized Ferns for Keyframe Encoding , 2015, IEEE Transactions on Visualization and Computer Graphics.

[37]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[38]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[39]  Roland Memisevic,et al.  Learning Visual Odometry with a Convolutional Network , 2015, VISAPP.

[40]  S. Amirhassan Monadjemi,et al.  Iterative algorithm for interactive co-segmentation using semantic information propagation , 2018, Applied Intelligence.

[41]  Jaime G. Carbonell,et al.  Characterizing and Avoiding Negative Transfer , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Tomasz Malisiewicz,et al.  Deep Image Homography Estimation , 2016, ArXiv.

[43]  Philip H. S. Torr,et al.  Collaborative Large-Scale Dense 3D Reconstruction with Online Inter-Agent Pose Optimisation , 2018, IEEE Transactions on Visualization and Computer Graphics.

[44]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[45]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[46]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[48]  Jan Kautz,et al.  Geometry-Aware Learning of Maps for Camera Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.