Concept Detection based on Multi-label Classification and Image Captioning Approach - DAMO at ImageCLEF 2019

Medical image captioning is an important and challenging task, which covers computer vision and natural language processing. This ImageCLEF 2019 [6] Caption competition is dedicated to research this field. The purpose of this year challenge is using radiological images to detect the concepts representing the key information. In this paper, we illustrate the proposed method to address the issue, based on multilabel classification model and CNN-LSTM architecture with attention mechanism. We also perform a detailed analysis and processing for the overall dataset and demonstrate performance with the baseline in the caption prediction task. In final evaluation, we completed 9 submissions and ranked second among 12 participants with our best mean F1-score.

[1]  Yejin Choi,et al.  Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.

[2]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[3]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Carlos R. del-Blanco,et al.  ImageCLEF 2019: Multimedia Retrieval in Medicine, Lifelogging, Security and Nature , 2019, CLEF.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Pratik Rane,et al.  Self-Critical Sequence Training for Image Captioning , 2018 .

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  R. J. Roberts PubMed Central: The GenBank of the published literature. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[14]  Henning Müller,et al.  Overview of the ImageCLEFmed 2019 Concept Detection Task , 2019, CLEF.

[15]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Richard Socher,et al.  Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).