Deeply Learned 2D Tool Pose Estimation for Robot-to-Camera Registration

Robot-assisted eye surgery is the central topic of the EU funded project EurEyeCase. Major objectives of the project comprise the development of methodologies to perform two surgical procedures that cannot be easily carried out by human surgeons, namely retinal vein cannulation and retinal membrane peeling. In the proposed assistive system, visual guidance is provided from a camera mounted on the microscope. In order to guide the robot using visual cues, it is necessary to register the camera coordinates to the robot coordinates. To this end, we propose a framework that estimates the position and the pose of the tool to register the two different coordinate systems. Using recent advances in convolutional neural networks (CNNs), we present a comparative study among different intuitive architectural designs, and suggest a methodology to register the coordinates by detecting pre-defined keypoints. Results suggest that tool pose estimation can be highly accurate, running in real-time on a GPU.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Luc Van Gool,et al.  Deep Retinal Image Understanding , 2016, MICCAI.

[3]  Jianliang Tang,et al.  Complete Solution Classification for the Perspective-Three-Point Problem , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Xiaowei Zhou,et al.  Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.