A simplified cluster model and a tool adapted for collaborative labeling of lung cancer CT scans

BACKGROUND AND OBJECTIVE Lung cancer is the most common type of cancer with a high mortality rate. Early detection using medical imaging is critically important for the long-term survival of the patients. Computer-aided diagnosis (CAD) tools can potentially reduce the number of incorrect interpretations of medical image data by radiologists. Datasets with adequate sample size, annotation, and truth are the dominant factors in developing and training effective CAD algorithms. The objective of this study was to produce a practical approach and a tool for the creation of medical image datasets. METHODS The proposed model uses the modified maximum transverse diameter approach to mark a putative lung nodule. The modification involves the possibility to use a set of overlapping spheres of appropriate size to approximate the shape of the nodule. The algorithm embedded in the model also groups the marks made by different readers for the same lesion. We used the data of 536 randomly selected patients of Moscow outpatient clinics to create a dataset of standard-dose chest computed tomography (CT) scans utilizing the double-reading approach with arbitration. Six volunteer radiologists independently produced a report for each scan using the proposed model with the main focus on the detection of lesions with sizes ranging from 3 to 30 mm. After this, an arbitrator reviewed their marks and annotations. RESULTS The maximum transverse diameter approach outperformed the alternative methods (3D box, ellipsoid, and complete outline construction) in a study of 10,000 computer-generated tumor models of different shapes in terms of accuracy and speed of nodule shape approximation. The markup and annotation of the CTLungCa-500 dataset revealed 72 studies containing no lung nodules. The remaining 464 CT scans contained 3151 lesions marked by at least one radiologist: 56%, 14%, and 29% of the lesions were malignant, benign, and non-nodular, respectively. 2887 lesions have the target size of 3-30 mm. Only 70 nodules were uniformly identified by all the six readers. An increase in the number of independent readers providing CT scans interpretations led to an accuracy increase associated with a decrease in agreement. The dataset markup process took three working weeks. CONCLUSIONS The developed cluster model simplifies the collaborative and crowdsourced creation of image repositories and makes it time-efficient. Our proof-of-concept dataset provides a valuable source of annotated medical imaging data for training CAD algorithms aimed at early detection of lung nodules. The tool and the dataset are publicly available at https://github.com/Center-of-Diagnostics-and-Telemedicine/FAnTom.git and https://mosmed.ai/en/datasets/ct_lungcancer_500/, respectively.

[1]  P. Jänne,et al.  Revisiting the relationship between tumour volume and diameter in advanced NSCLC patients: An exercise to maximize the utility of each measure to assess response to therapy. , 2014, Clinical radiology.

[2]  Lorenzo Bonomo,et al.  Missed lung cancer: when, where, and why? , 2017, Diagnostic and interventional radiology.

[3]  Shadi Albarqouni,et al.  AggNet : Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images , 2016 .

[4]  Ronald M. Summers,et al.  Medical Image Data and Datasets in the Era of Machine Learning—Whitepaper from the 2016 C-MIMI Meeting Dataset Session , 2017, Journal of Digital Imaging.

[5]  Guido Gerig,et al.  User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability , 2006, NeuroImage.

[6]  A. Bankier,et al.  Guidelines for Management of Incidental Pulmonary Nodules Detected on CT Images: From the Fleischner Society 2017. , 2017, Radiology.

[7]  Ilaria Gori,et al.  Comparing and combining algorithms for computer-aided detection of pulmonary nodules in computed tomography scans: The ANODE09 study , 2010, Medical Image Anal..

[8]  C. Butts,et al.  Adjuvant therapy in non-small cell lung cancer: current and future directions. , 2010, The oncologist.

[9]  Georg Langs,et al.  Predicting Semantic Descriptions from Medical Images with Convolutional Neural Networks , 2015, IPMI.

[10]  Keno März,et al.  Large-scale medical image annotation with crowd-powered algorithms , 2018, Journal of medical imaging.

[11]  J. Jett Limitations of Screening for Lung Cancer with Low-Dose Spiral Computed Tomography , 2005, Clinical Cancer Research.

[12]  M. Geijer,et al.  Added value of double reading in diagnostic radiology,a systematic review , 2018, Insights into Imaging.

[13]  R G Swensson,et al.  Improving performance by multiple interpretations of chest radiographs: effectiveness and cost. , 1978, Radiology.

[14]  Jan Kybic,et al.  The Lung TIME: annotated lung nodule dataset and nodule detection framework , 2009, Medical Imaging.

[15]  Marleen de Bruijne,et al.  Early Experiences with Crowdsourcing Airway Annotations in Chest CT , 2016, LABELS/DLMIA@MICCAI.

[16]  Prevalence of pulmonary multi-nodularity in CT lung cancer screening , 2015 .

[17]  Методология и инструментарий создания обучающих выборок для систем искусственного интеллекта по распознаванию рака легкого на КТ-изображениях , 2020 .

[18]  Nobhojit Roy,et al.  The Global Burden of Cancer 2013. , 2015, JAMA oncology.

[19]  P G Herman,et al.  Accuracy and its relationship to experience in the interpretation of chest radiographs. , 1975, Investigative radiology.

[20]  Xiuhua Guo,et al.  Computer-Aided Diagnosis for Early-Stage Lung Cancer Based on Longitudinal and Balanced Data , 2013, PloS one.

[21]  N. Navin,et al.  The first five years of single-cell cancer genomics and beyond , 2015, Genome research.

[22]  Lori Stewart,et al.  Prediction of lung cancer risk at follow-up screening with low-dose CT: a training and validation study of a deep learning method. , 2019, The Lancet. Digital health.

[23]  Kevin W. Eliceiri,et al.  ImageJ2: ImageJ for the next generation of scientific image data , 2017, BMC Bioinformatics.

[24]  Nishita Kothary,et al.  Computed tomography-guided percutaneous needle biopsy of pulmonary nodules: impact of nodule size on diagnostic accuracy. , 2009, Clinical lung cancer.

[25]  Klaus H. Maier-Hein,et al.  The Medical Imaging Interaction Toolkit: challenges and advances , 2013, International Journal of Computer Assisted Radiology and Surgery.

[26]  Qing Zeng-Treitler,et al.  Predicting sample size required for classification performance , 2012, BMC Medical Informatics and Decision Making.

[27]  Bradley J. Erickson,et al.  RIL-Contour: a Medical Imaging Dataset Annotation Tool for and with Deep Learning , 2019, Journal of Digital Imaging.

[28]  Hao Chen,et al.  Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: The LUNA16 challenge , 2016, Medical Image Anal..

[29]  O. Miettinen,et al.  CT screening for lung cancer: frequency and significance of part-solid and nonsolid nodules. , 2002, AJR. American journal of roentgenology.

[30]  G. Corrado,et al.  End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography , 2019, Nature Medicine.

[31]  Paulo Mazzoncini de Azevedo Marques,et al.  Radiomics-based features for pattern recognition of lung cancer histopathology and metastases , 2018, Comput. Methods Programs Biomed..

[32]  R. Fitzgerald Error in radiology. , 2001, Clinical radiology.

[33]  Yana Qi,et al.  Radiomics analysis of lung CT image for the early detection of metastases in patients with breast cancer: preliminary findings from a retrospective cohort study , 2020, European Radiology.

[34]  A. Jemal,et al.  Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries , 2018, CA: a cancer journal for clinicians.

[35]  Jan Egger,et al.  Computed tomography data collection of the complete human mandible and valid clinical ground truth models , 2019, Scientific Data.

[36]  Hassan Lemjabbar-Alaoui,et al.  Lung cancer: Biology and treatment options. , 2015, Biochimica et biophysica acta.

[37]  Karen Drukker,et al.  LUNGx Challenge for computerized lung nodule classification , 2016, Journal of medical imaging.

[38]  C. Gatsonis,et al.  Reduced Lung-Cancer Mortality with Low-Dose Computed Tomographic Screening , 2012 .

[39]  Milan Sonka,et al.  3D Slicer as an image computing platform for the Quantitative Imaging Network. , 2012, Magnetic resonance imaging.

[40]  Xiabi Liu,et al.  The LISS—A Public Database of Common Imaging Signs of Lung Diseases for Computer-Aided Detection and Diagnosis Research and Medical Education , 2015, IEEE Transactions on Biomedical Engineering.

[41]  M. Revel,et al.  Are two-dimensional CT measurements of small noncalcified pulmonary nodules reliable? , 2004, Radiology.

[42]  Richard C. Pais,et al.  The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. , 2011, Medical physics.