论文信息 - Radiology Objects in COntext (ROCO): A Multimodal Image Dataset

Radiology Objects in COntext (ROCO): A Multimodal Image Dataset

This work introduces a new multimodal image dataset, with the aim of detecting the interplay between visual elements and semantic relations present in radiology images. The objective is accomplished by retrieving all image-caption pairs from the open-access biomedical literature database PubMedCentral, as these captions describe the visual content in their semantic context. All compound, multi-pane, and non-radiology images were eliminated using an automatic binary classifier fine-tuned with a deep convolutional neural network system. Radiology Objects in COntext (ROCO) dataset contains over 81k radiology images with several medical imaging modalities including Computer Tomography, Ultrasound, X-Ray, Fluoroscopy, Positron Emission Tomography, Mammography, Magnetic Resonance Imaging, Angiography. All images in ROCO have corresponding caption, keywords, Unified Medical Language Systems Concept Unique Identifiers and Semantic Type. An out-of-class set with 6k images ranging from synthetic radiology figures to digital arts is provided, to improve prediction and classification performance. Adopting ROCO, systems for caption and keywords generation can be modeled, which allows multimodal representation for datasets lacking text representation. Systems with the goal of image structuring and semantic information tagging can be created using ROCO, which is beneficial and of assistance for image and information retrieval purposes.

[1] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2] R. J. Roberts. PubMed Central: The GenBank of the published literature. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3] Barbara Caputo,et al. Overview of the CLEF 2009 Medical Image Annotation Track , 2009, CLEF.

[4] Henning Müller,et al. Overview of the ImageCLEF 2016 Medical Task , 2016, CLEF.

[5] Christoph M. Friedrich,et al. FHDO Biomedical Computer Science Group at Medical Classification Task of ImageCLEF 2015 , 2015, CLEF.

[6] Yan Xu,et al. Deep learning of feature representation with multiple instance learning for medical image analysis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7] Sven Koitka,et al. Optimized Convolutional Neural Network Ensembles for Medical Subfigure Classification , 2017, CLEF.

[8] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[9] Christoph M. Friedrich,et al. Adopting Semantic Information of Grayscale Radiographs for Image Classification and Retrieval , 2018, BIOIMAGING.

[10] Martin Porter,et al. Snowball: A language for stemming algorithms , 2001 .

[11] Olivier Bodenreider,et al. The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[12] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[13] Henning Müller,et al. Overview of the ImageCLEF 2015 Medical Classification Task , 2015, CLEF.

[14] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[15] Sameer Antani,et al. Creating a classification of image types in the medical literature for visual categorization , 2012, Other Conferences.

[16] Christoph M. Friedrich,et al. Keyword Generation for Biomedical Image Retrieval with Recurrent Neural Networks , 2017, CLEF.

[17] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[18] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[19] Henning Müller,et al. Overview of the ImageCLEF 2013 Medical Tasks , 2013, CLEF.

[20] Luca Soldaini. QuickUMLS: a fast, unsupervised approach for medical concept extraction , 2016 .

[21] Andrew Y. Ng,et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning , 2017, ArXiv.

[22] Dumitru Erhan,et al. Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Ewan Klein,et al. Natural Language Processing with Python , 2009 .