CT-Video Matching for Retrograde Intrarenal Surgery Based on Depth Prediction and Style Transfer

Retrograde intrarenal surgery (RIRS) is a minimally invasive endoscopic procedure for the treatment of kidney stones. Traditionally, RIRS is usually performed by reconstructing a 3D model of the kidney from preoperative CT images in order to locate the kidney stones; then, the surgeon finds and removes the stones with experience in endoscopic video. However, due to the many branches within the kidney, it can be difficult to relocate each lesion and to ensure that all branches are searched, which may result in the misdiagnosis of some kidney stones. To avoid this situation, we propose a convolutional neural network (CNN)-based method for matching preoperative CT images and intraoperative videos for the navigation of ureteroscopic procedures. First, a pair of synthetic images and depth maps reflecting preoperative information are obtained from a 3D model of the kidney. Then, a style transfer network is introduced to transfer the ureteroscopic images to the synthetic images, which can generate the associated depth maps. Finally, the fusion and matching of depth maps of preoperative images and intraoperative video images are realized based on semantic features. Compared with the traditional CT-video matching method, our method achieved a five times improvement in time performance and a 26% improvement in the top 10 accuracy.

[1]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Toshimitsu Kaneko,et al.  Deep monocular 3D reconstruction for assisted navigation in bronchoscopy , 2017, International Journal of Computer Assisted Radiology and Surgery.

[3]  Chunhua Shen,et al.  Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[5]  Russell H. Taylor,et al.  Evaluation and Stability Analysis of Video-Based Navigation System for Functional Endoscopic Sinus Surgery on In Vivo Clinical Data , 2018, IEEE Transactions on Medical Imaging.

[6]  Ashutosh Saxena,et al.  Make3D: Learning 3D Scene Structure from a Single Still Image , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[8]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[9]  William E. Higgins,et al.  Interactive CT-Video Registration for the Continuous Guidance of Bronchoscopy , 2013, IEEE Transactions on Medical Imaging.

[10]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[11]  Daniel Mirota,et al.  A System for Video-Based Navigation for Endoscopic Endonasal Skull Base Surgery , 2012, IEEE Transactions on Medical Imaging.