HAPNet: hierarchically aggregated pyramid network for real-time stereo matching

ABSTRACT Recovering the 3D shape of the surgical site is crucial for multiple computer-assisted interventions. Stereo endoscopes can be used to compute 3D depth but computational stereo is a challenging, non-convex and inherently discontinuous optimisation problem. In this paper, we propose a deep learning architecture which avoids the explicit construction of a cost volume of similarity which is one of the most computationally costly blocks of stereo algorithms. This makes training our network significantly more efficient and avoids the needs for large memory allocation. Our method performs well, especially around regions comprising multiple discontinuities around surgical instrumentation or around complex small structures and instruments. The method compares well to the state-of-the-art techniques while taking a different methodological angle to computational stereo problem in surgical video.

[1]  Rui Hu,et al.  DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Masatoshi Okutomi,et al.  3D Reconstruction of Whole Stomach from Endoscope Video Using Structure-from-Motion , 2019, 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[3]  Blake Hannaford,et al.  Surgical Instrument Segmentation for Endoscopic Vision with Data Fusion of rediction and Kinematic Pose , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[4]  Danail Stoyanov,et al.  Implicit domain adaptation with conditional generative adversarial networks for depth prediction in endoscopy , 2019, International Journal of Computer Assisted Radiology and Surgery.

[5]  Nicolas Padoy,et al.  Self-Supervised Surgical Tool Segmentation using Kinematic Information , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[6]  Luigi di Stefano,et al.  Real-Time Self-Adaptive Deep Stereo , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Ian D. Reid,et al.  Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Wei Chen,et al.  Learning Deep Correspondence through Prior and Posterior Feature Constancy , 2017, ArXiv.

[10]  Danail Stoyanov,et al.  Widening siamese architectures for stereo matching , 2017, Pattern Recognit. Lett..

[11]  Qiong Yan,et al.  Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[12]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[14]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Lena Maier-Hein,et al.  Comparative Validation of Single-Shot Optical Techniques for Laparoscopic 3-D Surface Reconstruction , 2014, IEEE Transactions on Medical Imaging.

[18]  Guang-Zhong Yang,et al.  Real-Time Stereo Reconstruction in Robotically Assisted Minimally Invasive Surgery , 2010, MICCAI.