The Accuracy and Reliability of Crowdsource Annotations of Digital Retinal Images

Purpose Crowdsourcing is based on outsourcing computationally intensive tasks to numerous individuals in the online community who have no formal training. Our aim was to develop a novel online tool designed to facilitate large-scale annotation of digital retinal images, and to assess the accuracy of crowdsource grading using this tool, comparing it to expert classification. Methods We used 100 retinal fundus photograph images with predetermined disease criteria selected by two experts from a large cohort study. The Amazon Mechanical Turk Web platform was used to drive traffic to our site so anonymous workers could perform a classification and annotation task of the fundus photographs in our dataset after a short training exercise. Three groups were assessed: masters only, nonmasters only and nonmasters with compulsory training. We calculated the sensitivity, specificity, and area under the curve (AUC) of receiver operating characteristic (ROC) plots for all classifications compared to expert grading, and used the Dice coefficient and consensus threshold to assess annotation accuracy. Results In total, we received 5389 annotations for 84 images (excluding 16 training images) in 2 weeks. A specificity and sensitivity of 71% (95% confidence interval [CI], 69%–74%) and 87% (95% CI, 86%–88%) was achieved for all classifications. The AUC in this study for all classifications combined was 0.93 (95% CI, 0.91–0.96). For image annotation, a maximal Dice coefficient (∼0.6) was achieved with a consensus threshold of 0.25. Conclusions This study supports the hypothesis that annotation of abnormalities in retinal images by ophthalmologically naive individuals is comparable to expert annotation. The highest AUC and agreement with expert annotation was achieved in the nonmasters with compulsory training group. Translational Relevance The use of crowdsourcing as a technique for retinal image analysis may be comparable to expert graders and has the potential to deliver timely, accurate, and cost-effective image analysis.

[1]  Lena Maier-Hein,et al.  Can Masses of Non-Experts Train Highly Accurate Image Classifiers? - A Crowdsourcing Approach to Instrument Segmentation in Laparoscopic Images , 2014, MICCAI.

[2]  T. Peto,et al.  Crowdsourcing as a Novel Technique for Retinal Fundus Photography Classification: Analysis of Images in the EPIC Norfolk Cohort on Behalf of the UKBiobank Eye and Vision Consortium , 2013, PloS one.

[3]  Todd M. Gureckis,et al.  CUNY Academic , 2016 .

[4]  Ron Kikinis,et al.  Statistical validation of image segmentation quality based on a spatial overlap index. , 2004, Academic radiology.

[5]  Meindert Niemeijer,et al.  Evaluation of a computer-aided diagnosis system for diabetic retinopathy screening on public data. , 2011, Investigative ophthalmology & visual science.

[6]  C. Sinthanayothin,et al.  Automated Early Detection of Diabetic Retinopathy Using Image Analysis Techniques , 2016 .

[7]  G. Quellec,et al.  Automated analysis of retinal images for detection of referable diabetic retinopathy. , 2013, JAMA ophthalmology.

[8]  Joseph E. Burns,et al.  Note: This Copy Is for Your Personal Non-commercial Use Only. to Order Presentation-ready Copies for Distribution to Your Colleagues or Clients, Contact Us at Www.rsna.org/rsnarights. Distributed Human Intelligence for Colonic Polyp Classification in Computer-aided Detection for Ct Colonography 1 , 2022 .

[9]  Stefanie Nowak,et al.  How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation , 2010, MIR '10.

[10]  Diabetic retinopathy study. Report Number 6. Design, methods, and baseline results. Report Number 7. A modification of the Airlie House classification of diabetic retinopathy. Prepared by the Diabetic Retinopathy. , 1981, Investigative ophthalmology & visual science.

[11]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[12]  S. Wild,et al.  Global prevalence of diabetes: estimates for the year 2000 and projections for 2030. , 2004, Diabetes care.

[13]  Steve Feng,et al.  Distributed Medical Image Analysis and Diagnosis through Crowd-Sourced Games: A Malaria Case Study , 2012, PloS one.

[14]  Jennifer L Pearson,et al.  Rapid Grading of Fundus Photographs for Diabetic Retinopathy Using Crowdsourcing , 2014, Journal of medical Internet research.

[15]  Gwénolé Quellec,et al.  Automated early detection of diabetic retinopathy. , 2010, Ophthalmology.

[16]  Timothy M. Kowalewski,et al.  Crowd-Sourced Assessment of Technical Skills: a novel method to evaluate surgical performance. , 2014, The Journal of surgical research.

[17]  Jeffrey R Hawley,et al.  Influences of Radiology Trainees on Screening Mammography Interpretation. , 2016, Journal of the American College of Radiology : JACR.

[18]  Dawn A Sim,et al.  Automated Retinal Image Analysis for Diabetic Retinopathy in Telemedicine , 2015, Current Diabetes Reports.

[19]  A. Cavallerano,et al.  Nonmydriatic teleretinal imaging improves adherence to annual eye examinations in patients with diabetes. , 2006, Journal of rehabilitation research and development.

[20]  Ingrid U Scott,et al.  Single-field fundus photography for diabetic retinopathy screening: a report by the American Academy of Ophthalmology. , 2004, Ophthalmology.

[21]  D. Owens,et al.  Prevalence of diabetic retinopathy within a national diabetic retinopathy screening service , 2014, British Journal of Ophthalmology.

[22]  Tunde Peto,et al.  Crowdsourcing as a Screening Tool to Detect Clinical Features of Glaucomatous Optic Neuropathy from Digital Photography , 2015, PloS one.

[23]  R. Luben,et al.  The EPIC-Norfolk Eye Study: rationale, methods and a cross-sectional analysis of visual impairment in a population-based cohort , 2013, BMJ Open.