Class-Incremental Domain Adaptation with Smoothing and Calibration for Surgical Report Generation

Generating surgical reports aimed at surgical scene understanding in robot-assisted surgery can contribute to documenting entry tasks and post-operative analysis. Despite the impressive outcome, the deep learning model degrades the performance when applied to different domains encountering domain shifts. In addition, there are new instruments and variations in surgical tissues appeared in robotic surgery. In this work, we propose class-incremental domain adaptation (CIDA) with a multi-layer transformer-based model to tackle the new classes and domain shift in the target domain to generate surgical reports during robotic surgery. To adapt incremental classes and extract domain invariant features, a class-incremental (CI) learning method with supervised contrastive (SupCon) loss is incorporated with a feature extractor. To generate caption from the extracted feature, curriculum by onedimensional gaussian smoothing (CBS) is integrated with a multi-layer transformer-based caption prediction model. CBS smoothes the features embedding using anti-aliasing and helps the model to learn domain invariant features. We also adopt label smoothing (LS) to calibrate prediction probability and obtain better feature representation with both feature extractor and captioning model. The proposed techniques are empirically evaluated by using the datasets of two surgical domains, such as nephrectomy operations and transoral robotic surgery. We observe that domain invariant feature learning and the well-calibrated network improves the surgical report generation performance in both source and target domain under domain shift and unseen classes in the manners of one-shot and few-shot learning. The code is publicly available at https://github.com/XuMengyaAmy/CIDACaptioning.

[1]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Lena Maier-Hein,et al.  Surgical Visual Domain Adaptation: Results from the MICCAI 2020 SurgVisDom Challenge , 2021, ArXiv.

[3]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[4]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[5]  Anirban Mukhopadhyay,et al.  Endo-Sim2Real: Consistency learning-based domain adaptation for instrument segmentation , 2020, MICCAI.

[6]  Hugo Larochelle,et al.  Curriculum By Smoothing , 2020, NeurIPS.

[7]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  R. Venkatesh Babu,et al.  Class-Incremental Domain Adaptation , 2020, ECCV.

[10]  Beliz Gunel,et al.  Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning , 2020, ICLR.

[11]  Jeremy Nixon,et al.  Measuring Calibration in Deep Learning , 2019, CVPR Workshops.

[12]  Rita Cucchiara,et al.  Meshed-Memory Transformer for Image Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Tao Mei,et al.  X-Linear Attention Networks for Image Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[15]  L. Maier-Hein,et al.  2018 Robotic Scene Segmentation Challenge , 2020, ArXiv.

[16]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[17]  Cordelia Schmid,et al.  End-to-End Incremental Learning , 2018, ECCV.

[18]  Yan Wang,et al.  TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation , 2021, ArXiv.

[19]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[20]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[21]  Dmitry Vetrov,et al.  Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning , 2020, ICLR.

[22]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[23]  Mobarakol Islam,et al.  Learning Domain Adaptation with Model Calibration for Surgical Report Generation in Robotic Surgery , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Geoffrey E. Hinton,et al.  When Does Label Smoothing Help? , 2019, NeurIPS.