Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs

We have attempted to reproduce the results in Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs, published in JAMA 2016; 316(22), using publicly available data sets. We re-implemented the main method in the original study since the source code is not available. The original study used non-public fundus images from EyePACS and three hospitals in India for training. We used a different EyePACS data set from Kaggle. The original study used the benchmark data set Messidor-2 to evaluate the algorithm’s performance. We used another distribution of the Messidor-2 data set, since the original data set is no longer available. In the original study, ophthalmologists re-graded all images for diabetic retinopathy, macular edema, and image gradability. We have one diabetic retinopathy grade per image for our data sets, and we assessed image gradability ourselves. We were not able to reproduce the original study’s results with publicly available data. Our algorithm’s area under the receiver operating characteristic curve (AUC) of 0.951 (95% CI, 0.947-0.956) on the Kaggle EyePACS test set and 0.853 (95% CI, 0.835-0.871) on Messidor-2 did not come close to the reported AUC of 0.99 on both test sets in the original study. This may be caused by the use of a single grade per image, or different data. This study shows the challenges of reproducing deep learning method results, and the need for more replication and reproduction studies to validate deep learning methods, especially for medical image analysis. Our source code and instructions are available at: https://github.com/mikevoets/jama16-retina-replication.

[1]  A. Rakhlin Diabetic Retinopathy detection through integration of Deep Learning classification framework , 2017, bioRxiv.

[2]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[3]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[4]  Matthew D. Davis,et al.  Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales. , 2003, Ophthalmology.

[5]  Laude,et al.  FEEDBACK ON A PUBLICLY DISTRIBUTED IMAGE DATABASE: THE MESSIDOR DATABASE , 2014 .

[6]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[7]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[8]  M. Abràmoff,et al.  Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning. , 2016, Investigative ophthalmology & visual science.

[9]  V. Sudha,et al.  Diabetic Retinopathy Detection , 2020, International Journal of Engineering and Advanced Technology.

[10]  The challenges of replication , 2017, eLife.

[11]  Subhashini Venugopalan,et al.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. , 2016, JAMA.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Rishab Gargeya,et al.  Automated Identification of Diabetic Retinopathy Using Deep Learning. , 2017, Ophthalmology.

[14]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[15]  E.E. Pissaloux,et al.  Image Processing , 1994, Proceedings. Second Euromicro Workshop on Parallel and Distributed Processing.

[16]  Rich Caruana,et al.  Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[17]  Gwénolé Quellec,et al.  Optimal Wavelet Transform for the Detection of Microaneurysms in Retina Photographs , 2008, IEEE Transactions on Medical Imaging.

[18]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[19]  Rajiv Raman,et al.  Fundus photograph-based deep learning algorithms in detecting diabetic retinopathy , 2018, Eye.

[20]  Jonathan Krause,et al.  Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy , 2017, Ophthalmology.

[21]  Reality check on reproducibility , 2016, Nature.

[22]  C. Lee Giles,et al.  Overfitting and neural networks: conjugate gradient and backpropagation , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[23]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Matthew Hutson,et al.  Missing data hinder replication of artificial intelligence studies , 2018 .

[25]  E. Finkelstein,et al.  Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes , 2017, JAMA.