Proactive Construction of an Annotated Imaging Database for Artificial Intelligence Training

Artificial intelligence (AI) holds much promise for enabling highly desired imaging diagnostics improvements. One of the most limiting bottlenecks for the development of useful clinical-grade AI models is the lack of training data. One aspect is the large amount of cases needed and another is the necessity of high-quality ground truth annotation. The aim of the project was to establish and describe the construction of a database with substantial amounts of detail-annotated oncology imaging data from pathology and radiology. A specific objective was to be proactive, that is, to support undefined subsequent AI training across a wide range of tasks, such as detection, quantification, segmentation, and classification, which puts particular focus on the quality and generality of the annotations. The main outcome of this project was the database as such, with a collection of labeled image data from breast, ovary, skin, colon, skeleton, and liver. In addition, this effort also served as an exploration of best practices for further scalability of high-quality image collections, and a main contribution of the study was generic lessons learned regarding how to successfully organize efforts to construct medical imaging databases for AI training, summarized as eight guiding principles covering team, process, and execution aspects.

[1]  Martin Lindvall,et al.  Annotations, Ontologies, and Whole Slide Images – Development of an Annotated Ontology-Driven Whole Slide Image Library of Normal and Abnormal Human Tissue , 2019, Journal of pathology informatics.

[2]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[3]  Nils Dahlström,et al.  Liver data from the Visual Sweden project DROID: Analytic Imaging Diagnostics Arena (AIDA) , 2019 .

[4]  Heinz Pampel,et al.  re3data.org – REgistry of REsearch data REpositories , 2015 .

[5]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[6]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[7]  Todd Kulesza,et al.  Structured labeling for facilitating concept evolution in machine learning , 2014, CHI.

[8]  Martin Lindvall,et al.  TissueWand, a Rapid Histopathology Annotation Tool , 2020, Journal of pathology informatics.

[9]  Ioannis Kalatzis,et al.  Development of a Reference Image Collection Library for Histopathology Image Processing, Analysis and Decision Support Systems Research , 2017, Journal of Digital Imaging.

[10]  Steven Horng,et al.  MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports , 2019, Scientific Data.

[11]  Kathleen E. Fenerty,et al.  Resources Required for Semi-Automatic Volumetric Measurements in Metastatic Chordoma: Is Potentially Improved Tumor Burden Assessment Worth the Time Burden? , 2016, Journal of Digital Imaging.