A Novel Fundus Image Reading Tool for Efficient Generation of a Multi-dimensional Categorical Image Database for Machine Learning Algorithm Training

Background We described a novel multi-step retinal fundus image reading system for providing high-quality large data for machine learning algorithms, and assessed the grader variability in the large-scale dataset generated with this system. Methods A 5-step retinal fundus image reading tool was developed that rates image quality, presence of abnormality, findings with location information, diagnoses, and clinical significance. Each image was evaluated by 3 different graders. Agreements among graders for each decision were evaluated. Results The 234,242 readings of 79,458 images were collected from 55 licensed ophthalmologists during 6 months. The 34,364 images were graded as abnormal by at-least one rater. Of these, all three raters agreed in 46.6% in abnormality, while 69.9% of the images were rated as abnormal by two or more raters. Agreement rate of at-least two raters on a certain finding was 26.7%–65.2%, and complete agreement rate of all-three raters was 5.7%–43.3%. As for diagnoses, agreement of at-least two raters was 35.6%–65.6%, and complete agreement rate was 11.0%–40.0%. Agreement of findings and diagnoses were higher when restricted to images with prior complete agreement on abnormality. Retinal/glaucoma specialists showed higher agreements on findings and diagnoses of their corresponding subspecialties. Conclusion This novel reading tool for retinal fundus images generated a large-scale dataset with high level of information, which can be utilized in future development of machine learning-based algorithms for automated identification of abnormal conditions and clinical decision supporting system. These results emphasize the importance of addressing grader variability in algorithm developments.

[1]  Yusuke Arai,et al.  Applying artificial intelligence to disease staging: Deep learning for improved staging of diabetic retinopathy , 2017, PloS one.

[2]  Jonathan Krause,et al.  Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy , 2017, Ophthalmology.

[3]  Arunkumar Rajendran,et al.  Multi-retinal disease classification by reduced deep learning features , 2017, Neural Computing and Applications.

[4]  Tien Yin Wong,et al.  Artificial Intelligence With Deep Learning Technology Looks Into Diabetic Retinopathy Screening. , 2016, JAMA.

[5]  Laude,et al.  FEEDBACK ON A PUBLICLY DISTRIBUTED IMAGE DATABASE: THE MESSIDOR DATABASE , 2014 .

[6]  Gwénolé Quellec,et al.  Deep image mining for diabetic retinopathy screening , 2016, Medical Image Anal..

[7]  Gwénolé Quellec,et al.  Optimal Wavelet Transform for the Detection of Microaneurysms in Retina Photographs , 2008, IEEE Transactions on Medical Imaging.

[8]  Rishab Gargeya,et al.  Automated Identification of Diabetic Retinopathy Using Deep Learning. , 2017, Ophthalmology.

[9]  Subhashini Venugopalan,et al.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. , 2016, JAMA.

[10]  M. Abràmoff,et al.  Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning. , 2016, Investigative ophthalmology & visual science.

[11]  Terry Taewoong Um,et al.  Multi-categorical deep learning neural network to classify retinal images: A pilot study employing small database , 2017, PloS one.

[12]  Seong Ho Park,et al.  Connecting Technological Innovation in Artificial Intelligence to Real-world Medical Practice through Rigorous Clinical Validation: What Peer-reviewed Medical Journals Could Do , 2018, Journal of Korean medical science.

[13]  E. Finkelstein,et al.  Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes , 2017, JAMA.