Autonomously Navigating a Surgical Tool Inside the Eye by Learning from Demonstration

A fundamental challenge in retinal surgery is safely navigating a surgical tool to a desired goal position on the retinal surface while avoiding damage to surrounding tissues, a procedure that typically requires tens-of-microns accuracy. In practice, the surgeon relies on depth-estimation skills to localize the tool-tip with respect to the retina in order to perform the tool-navigation task, which can be prone to human error. To alleviate such uncertainty, prior work has introduced ways to assist the surgeon by estimating the tooltip distance to the retina and providing haptic or auditory feedback. However, automating the tool-navigation task itself remains unsolved and largely unexplored. Such a capability, if reliably automated, could serve as a building block to streamline complex procedures and reduce the chance for tissue damage. Towards this end, we propose to automate the tool-navigation task by learning to mimic expert demonstrations of the task. Specifically, a deep network is trained to imitate expert trajectories toward various locations on the retina based on recorded visual servoing to a given goal specified by the user. The proposed autonomous navigation system is evaluated in simulation and in physical experiments using a silicone eye phantom. We show that the network can reliably navigate a needle surgical tool to various desired locations within 137 µm accuracy in physical experiments and 94 µm in simulation on average, and generalizes well to unseen situations such as in the presence of auxiliary surgical tools, variable eye backgrounds, and brightness conditions.

[1]  H. Schulz-Hildebrandt,et al.  Combined OCT distance and FBG force sensing cannulation needle for retinal vein cannulation: in vivo animal validation , 2018, International Journal of Computer Assisted Radiology and Surgery.

[2]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[3]  Michael Vaiman,et al.  Variations in Eyeball Diameters of the Healthy Adults , 2014, Journal of ophthalmology.

[4]  Lior Wolf,et al.  Vid2Game: Controllable Characters Extracted from Real-World Videos , 2019, ICLR.

[5]  Guy Rosman,et al.  Variational Autoencoder for End-to-End Control of Autonomous Driving with Novelty Detection and Training De-biasing , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[7]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[8]  Dominiek Reynaerts,et al.  Development and Experimental Validation of a Combined FBG Force and OCT Distance Sensing Needle for Robot-Assisted Retinal Vein Cannulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jost B. Jonas,et al.  Scleral Thickness in Human Eyes , 2012, PloS one.

[11]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[13]  Vladlen Koltun,et al.  Learning to Act by Predicting the Future , 2016, ICLR.

[14]  Russell H. Taylor,et al.  Vision-Based Proximity Detection in Retinal Surgery , 2012, IEEE Transactions on Biomedical Engineering.

[15]  Gregory D. Hager,et al.  Visual Robot Task Planning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[16]  Mayank Bansal,et al.  ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.

[17]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[18]  V. Sudha,et al.  Diabetic Retinopathy Detection , 2020, International Journal of Engineering and Advanced Technology.

[19]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yuan-Fang Wang,et al.  Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Murilo M. Marinho,et al.  Autonomous Positioning of Eye Surgical Robot Using the Tool Shadow and Kalman Filtering , 2018, 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[24]  Marwan Mattar,et al.  Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Russell H. Taylor,et al.  New steady-hand Eye Robot with micro-force sensing for vitreoretinal surgery , 2010, 2010 3rd IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics.