Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists

Background Deep learning convolutional neural networks (CNN) may facilitate melanoma detection, but data comparing a CNN's diagnostic performance to larger groups of dermatologists are lacking. Methods Google's Inception v4 CNN architecture was trained and validated using dermoscopic images and corresponding diagnoses. In a comparative cross-sectional reader study a 100-image test-set was used (level-I: dermoscopy only; level-II: dermoscopy plus clinical information and images). Main outcome measures were sensitivity, specificity and area under the curve (AUC) of receiver operating characteristics (ROC) for diagnostic classification (dichotomous) of lesions by the CNN versus an international group of 58 dermatologists during level-I or -II of the reader study. Secondary end points included the dermatologists' diagnostic performance in their management decisions and differences in the diagnostic performance of dermatologists during level-I and -II of the reader study. Additionally, the CNN's performance was compared with the top-five algorithms of the 2016 International Symposium on Biomedical Imaging (ISBI) challenge. Results In level-I dermatologists achieved a mean (±standard deviation) sensitivity and specificity for lesion classification of 86.6% (±9.3%) and 71.3% (±11.2%), respectively. More clinical information (level-II) improved the sensitivity to 88.9% (±9.6%, P = 0.19) and specificity to 75.7% (±11.7%, P < 0.05). The CNN ROC curve revealed a higher specificity of 82.5% when compared with dermatologists in level-I (71.3%, P < 0.01) and level-II (75.7%, P < 0.01) at their sensitivities of 86.6% and 88.9%, respectively. The CNN ROC AUC was greater than the mean ROC area of dermatologists (0.86 versus 0.79, P < 0.01). The CNN scored results close to the top three algorithms of the ISBI 2016 challenge. Conclusions For the first time we compared a CNN's diagnostic performance with a large international group of 58 dermatologists, including 30 experts. Most dermatologists were outperformed by the CNN. Irrespective of any physicians' experience, they may benefit from assistance by a CNN's image classification. Clinical trial number This study was registered at the German Clinical Trial Register (DRKS-Study-ID: DRKS00013570; https://www.drks.de/drks_web/).

[1]  K Wolff,et al.  In vivo epiluminescence microscopy of pigmented skin lesions. I. Pattern analysis of pigmented skin lesions. , 1987, Journal of the American Academy of Dermatology.

[2]  S. Menzies,et al.  Short-term digital surface microscopic monitoring of atypical or changing melanocytic lesions. , 2001, Archives of dermatology.

[3]  P. Aegerter,et al.  Is dermoscopy (epiluminescence microscopy) useful for the diagnosis of melanoma? Results of a meta-analysis using techniques adapted to the evaluation of diagnostic tests. , 2001, Archives of dermatology.

[4]  P. Carli,et al.  Pattern analysis, not simplified algorithms, is the most reliable method for teaching dermoscopy for melanoma diagnosis to residents in dermatology , 2003, The British journal of dermatology.

[5]  R. Wolfe,et al.  Comparative performance of 4 dermoscopic algorithms by nonexperts for the diagnosis of melanocytic lesions. , 2005, Archives of dermatology.

[6]  H. Koh,et al.  Melanoma screening: focusing the public health journey. , 2007, Archives of dermatology.

[7]  S. Menzies,et al.  Dermoscopy compared with naked eye examination for the diagnosis of primary melanoma: a meta‐analysis of studies performed in a clinical setting , 2008, The British journal of dermatology.

[8]  S. Menzies,et al.  Assessment of the optimal interval for and sensitivity of short-term sequential digital dermoscopy monitoring for the diagnosis of melanoma. , 2008, Archives of dermatology.

[9]  S. Menzies,et al.  Impact of dermoscopy and short‐term sequential digital dermoscopy imaging for the management of pigmented lesions in primary care: a sequential intervention trial , 2009, The British journal of dermatology.

[10]  S. Menzies,et al.  Variables predicting change in benign melanocytic nevi undergoing short-term dermoscopic imaging. , 2011, Archives of dermatology.

[11]  James Bailey,et al.  Computer-Aided Diagnosis of Melanoma Using Border- and Wavelet-Based Texture Analysis , 2012, IEEE Transactions on Information Technology in Biomedicine.

[12]  I Zalaudek,et al.  Meta‐analysis of digital dermoscopy follow‐up of melanocytic skin lesions: a study on behalf of the International Dermoscopy Society , 2013, Journal of the European Academy of Dermatology and Venereology : JEADV.

[13]  David A. Clausi,et al.  Segmentation of Skin Lesions From Digital Images Using Joint Statistical Texture Distinctiveness , 2014, IEEE Transactions on Biomedical Engineering.

[14]  M. Weichenthal,et al.  To excise or not: impact of MelaFind on German dermatologists’ decisions to biopsy atypical lesions , 2014, Journal der Deutschen Dermatologischen Gesellschaft = Journal of the German Society of Dermatology : JDDG.

[15]  A. Stratigos,et al.  Emerging trends in the epidemiology of melanoma , 2014, The British journal of dermatology.

[16]  Jorge S. Marques,et al.  Improving Dermoscopy Image Classification Using Color Constancy , 2015, IEEE Journal of Biomedical and Health Informatics.

[17]  Mahadev Satyanarayanan,et al.  Computer-aided classification of melanocytic lesions using dermoscopic images. , 2015, Journal of the American Academy of Dermatology.

[18]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Mutlu Mete,et al.  Abrupt skin lesion border cutoff measurement for malignancy detection in dermoscopy images , 2016, BMC Bioinformatics.

[20]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[21]  A. Kalloo,et al.  Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: Comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images , 2018, Journal of the American Academy of Dermatology.