Performance Comparison of Individual and Ensemble CNN Models for the Classification of Brain 18F-FDG-PET Scans

The high-background glucose metabolism of normal gray matter on [18F]-fluoro-2-D-deoxyglucose (FDG) positron emission tomography (PET) of the brain results in a low signal-to-background ratio, potentially increasing the possibility of missing important findings in patients with intracranial malignancies. To explore the strategy of using a deep learning classifier to aid in distinguishing normal versus abnormal findings on PET brain images, this study evaluated the performance of a two-dimensional convolutional neural network (2D-CNN) to classify FDG PET brain scans as normal (N) or abnormal (A). Methods: Two hundred eighty-nine brain FDG-PET scans (N; n  = 150, A; n  = 139) resulting in a total of 68,260 images were included. Nine individual 2D-CNN models with three different window settings for axial, coronal, and sagittal axes were trained and validated. The performance of these individual and ensemble models was evaluated and compared using a test dataset. Odds ratio, Akaike’s information criterion (AIC), and area under curve (AUC) on receiver-operative-characteristic curve, accuracy, and standard deviation (SD) were calculated. Results: An optimal window setting to classify normal and abnormal scans was different for each axis of the individual models. An ensembled model using different axes with an optimized window setting (window-triad) showed better performance than ensembled models using the same axis and different windows settings (axis-triad). Increase in odds ratio and decrease in SD were observed in both axis-triad and window-triad models compared with individual models, whereas improvements of AUC and AIC were seen in window-triad models. An overall model averaging the probabilities of all individual models showed the best accuracy of 82.0%. Conclusions: Data ensemble using different window settings and axes was effective to improve 2D-CNN performance parameters for the classification of brain FDG-PET scans. If prospectively validated with a larger cohort of patients, similar models could provide decision support in a clinical setting.

[1]  Michael S Hofman,et al.  How We Read Oncologic FDG PET/CT , 2016, Cancer Imaging.

[2]  Xavier Lladó,et al.  Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review , 2017, Artif. Intell. Medicine.

[3]  Sameer K. Antani,et al.  Performance evaluation of deep neural ensembles toward malaria parasite detection in thin-blood smear images , 2019, PeerJ.

[4]  Brian Gale,et al.  Interpretive Error in Radiology. , 2017, AJR. American journal of roentgenology.

[5]  Danni Cheng,et al.  Classification of Alzheimer’s Disease by Combination of Convolutional and Recurrent Neural Networks Using FDG-PET Images , 2018, Front. Neuroinform..

[6]  Richard Kijowski,et al.  Deep convolutional neural network for segmentation of knee joint anatomy , 2018, Magnetic resonance in medicine.

[7]  Shunxing Bao,et al.  3D whole brain segmentation using spatially localized atlas network tiles , 2019, NeuroImage.

[8]  Shiqian Ma,et al.  Highly accurate model for prediction of lung nodule malignancy with CT scans , 2018, Scientific Reports.

[9]  P. Lakhani,et al.  Deep Learning at Chest Radiography: Automated Classification of Pulmonary Tuberculosis by Using Convolutional Neural Networks. , 2017, Radiology.

[10]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[11]  Helder Cesar Rodigues de Oliveira,et al.  Data Augmentation for Detection of Architectural Distortion in Digital Mammography using Deep Learning Approach , 2018, ArXiv.

[12]  C. Langlotz,et al.  Deep Learning to Classify Radiology Free-Text Reports. , 2017, Radiology.

[13]  Mark Muzi,et al.  Positron emission tomography imaging of brain tumors. , 2003, Neuroimaging clinics of North America.

[14]  Richard K. G. Do,et al.  Convolutional neural networks: an overview and application in radiology , 2018, Insights into Imaging.

[15]  Rasmus Larsen,et al.  An Ensemble of 2D Convolutional Neural Networks for Tumor Segmentation , 2015, SCIA.

[16]  Robert M. Nishikawa,et al.  A study on several Machine-learning methods for classification of Malignant and benign clustered microcalcifications , 2005, IEEE Transactions on Medical Imaging.

[17]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[18]  Robert J. Gillies,et al.  Predicting Nodule Malignancy using a CNN Ensemble Approach , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Mario Michael Krell,et al.  Rotational data augmentation for electroencephalographic data , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[21]  Giuseppe Esposito,et al.  Appropriate Use Criteria for 18F-FDG PET/CT in Restaging and Treatment Response Assessment of Malignant Disease , 2017, The Journal of Nuclear Medicine.

[22]  Gene Kitamura,et al.  Ankle Fracture Detection Utilizing a Convolutional Neural Network Ensemble Implemented with a Small Sample, De Novo Training, and Multiview Incorporation , 2019, Journal of Digital Imaging.

[23]  O. Abe,et al.  Deep learning for staging liver fibrosis on CT: a pilot study , 2018, European Radiology.

[24]  Hiroshi Honda,et al.  Current radiologist workload and the shortages in Japan: how many full-time radiologists are required? , 2015, Japanese Journal of Radiology.