Six Degree-of-Freedom Localization of Endoscopic Capsule Robots using Recurrent Neural Networks embedded into a Convolutional Neural Network

Since its development, ingestible wireless endoscopy is considered to be a painless diagnostic method to detect a number of diseases inside GI tract. Medical related engineering companies have made significant improvements in this technology in last decade; however, some major limitations still residue. Localization of the next generation steerable endoscopic capsule robot in six-degree-of freedom (6 DoF) and active motion control are some of these limitations. The significance of localization capability concerns with the correct diagnosis of the disease area. This paper presents a very robust 6-DoF localization method based on supervised training of an architecture consisting of recurrent networks (RNN) embedded into a convolutional neural network (CNN) to make use of both just-in-moment information obtained by CNN and correlative information across frames obtained by RNN. To our knowledge, the idea of embedding RNNs into a CNN architecture is for the first time proposed in literature. The experimental results show that the proposed RNN-in-CNN architecture performs very well for endoscopic capsule robot localization in cases reflection distortions, noise, sudden camera movements and lack of distinguishable features.

[1]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Horst Bischof,et al.  CD SLAM - continuous localization and mapping in a dynamic world , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[4]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[5]  J. M. M. Montiel,et al.  Visual SLAM for Handheld Monocular Endoscope , 2014, IEEE Transactions on Medical Imaging.

[6]  Metin Sitti,et al.  A 5-D Localization Method for a Magnetically Manipulated Untethered Robot Using a 2-D Array of Hall-Effect Sensors , 2016, IEEE/ASME Transactions on Mechatronics.

[7]  Yasin Almalioglu,et al.  A Deep Learning Based 6 Degree-of-Freedom Localization Method for Endoscopic Capsule Robots , 2017, ArXiv.

[8]  Janne Heikkilä,et al.  A four-step camera calibration procedure with implicit image correction , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  P. Cavanagh Visual cognition , 2011, Vision Research.

[11]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Frank Dellaert,et al.  iSAM2: Incremental smoothing and mapping using the Bayes tree , 2012, Int. J. Robotics Res..

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[15]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[16]  Stephen Lin,et al.  Single-image vignetting correction using radial gradient symmetry , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[18]  Metin Sitti,et al.  Biopsy using a Magnetic Capsule Endoscope Carrying, Releasing, and Retrieving Untethered Microgrippers , 2014, IEEE Transactions on Biomedical Engineering.

[19]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Helder Araújo,et al.  A Non-Rigid Map Fusion-Based RGB-Depth SLAM Method for Endoscopic Capsule Robots , 2017, ArXiv.

[21]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[22]  Guang-Zhong Yang,et al.  Real-Time Stereo Reconstruction in Robotically Assisted Minimally Invasive Surgery , 2010, MICCAI.

[23]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Roland Memisevic,et al.  Learning Visual Odometry with a Convolutional Network , 2015, VISAPP.

[26]  Ping-Sing Tsai,et al.  Shape from Shading: A Survey , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Eric Diller,et al.  Biomedical Applications of Untethered Mobile Milli/Microrobots , 2015, Proceedings of the IEEE.

[29]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[30]  Daniel Cremers,et al.  A primal-dual framework for real-time dense RGB-D scene flow , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..