Unsupervised Learning of Endoscopy Video Frames' Correspondences from Global and Local Transformation

Inferring the correspondences between consecutive video frames with high accuracy is essential for many medical image processing and computer vision tasks (e.g. image mosaicking, 3D scene reconstruction). Image correspondences can be computed by feature extraction and matching algorithms, which are computationally expensive and are challenged by low texture frames. Convolutional neural networks (CNN) can estimate dense image correspondences with high accuracy, but lack of labeled data especially in medical imaging does not allow end-to-end supervised training. In this paper, we present an unsupervised learning method to estimate dense image correspondences (DIC) between endoscopy frames by developing a new CNN model, called the EndoRegNet. Our proposed network has three distinguishing aspects: a local DIC estimator, a polynomial image transformer which regularizes local correspondences and a visibility mask which refines image correspondences. The EndoRegNet was trained on a mix of simulated and real endoscopy video frames, while its performance was evaluated on real endoscopy frames. We compared the results of EndoRegNet with traditional feature-based image registration. Our results show that EndoRegNet can provide faster and more accurate image correspondences estimation. It can also effectively deal with deformations and occlusions which are common in endoscopy video frames without requiring any labeled data.

[1]  Yi Yang,et al.  Occlusion Aware Unsupervised Learning of Optical Flow , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Silvio Savarese,et al.  Deep View Morphing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[4]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Sébastien Ourselin,et al.  Developing a Next Generation Colonoscopy Simulator , 2010, Int. J. Image Graph..

[8]  Zhichao Yin,et al.  GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Olivier Salvado,et al.  Automated visibility map of the internal colon surface from colonoscopy video , 2016, International Journal of Computer Assisted Radiology and Surgery.

[11]  Gian Luca Mariottini,et al.  Hierarchical Multi-Affine (HMA) algorithm for fast and accurate feature matching in minimally-invasive surgical images , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Stefan Roth,et al.  UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss , 2017, AAAI.

[13]  Guang-Zhong Yang,et al.  Three-Dimensional Tissue Deformation Recovery and Tracking , 2010, IEEE Signal Processing Magazine.

[14]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[15]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[16]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[17]  Guang-Zhong Yang,et al.  Online tracking and retargeting with applications to optical biopsy in gastrointestinal endoscopic examinations , 2016, Medical Image Anal..

[18]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[19]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[20]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Olivier Salvado,et al.  Uninformative Frame Detection in Colonoscopy Through Motion, Edge and Color Features , 2015, CARE@MICCAI.

[22]  Pietro Valdastri,et al.  Six DOF motion estimation for teleoperated flexible endoscopes using optical flow: A comparative study , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Max A. Viergever,et al.  End-to-End Unsupervised Deformable Image Registration with a Convolutional Neural Network , 2017, DLMIA/ML-CDS@MICCAI.

[24]  Alexei A. Efros,et al.  Learning Dense Correspondence via 3D-Guided Cycle Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).