Rapid Grading of Fundus Photographs for Diabetic Retinopathy Using Crowdsourcing

Background Screening for diabetic retinopathy is both effective and cost-effective, but rates of screening compliance remain suboptimal. As screening improves, new methods to deal with screening data may help reduce the human resource needs. Crowdsourcing has been used in many contexts to harness distributed human intelligence for the completion of small tasks including image categorization. Objective Our goal was to develop and validate a novel method for fundus photograph grading. Methods An interface for fundus photo classification was developed for the Amazon Mechanical Turk crowdsourcing platform. We posted 19 expert-graded images for grading by Turkers, with 10 repetitions per photo for an initial proof-of-concept (Phase I). Turkers were paid US $0.10 per image. In Phase II, one prototypical image from each of the four grading categories received 500 unique Turker interpretations. Fifty draws of 1-50 Turkers were then used to estimate the variance in accuracy derived from randomly drawn samples of increasing crowd size to determine the minimum number of Turkers needed to produce valid results. In Phase III, the interface was modified to attempt to improve Turker grading. Results Across 230 grading instances in the normal versus abnormal arm of Phase I, 187 images (81.3%) were correctly classified by Turkers. Average time to grade each image was 25 seconds, including time to review training images. With the addition of grading categories, time to grade each image increased and percentage of images graded correctly decreased. In Phase II, area under the curve (AUC) of the receiver-operator characteristic (ROC) indicated that sensitivity and specificity were maximized after 7 graders for ratings of normal versus abnormal (AUC=0.98) but was significantly reduced (AUC=0.63) when Turkers were asked to specify the level of severity. With improvements to the interface in Phase III, correctly classified images by the mean Turker grade in four-category grading increased to a maximum of 52.6% (10/19 images) from 26.3% (5/19 images). Throughout all trials, 100% sensitivity for normal versus abnormal was maintained. Conclusions With minimal training, the Amazon Mechanical Turk workforce can rapidly and correctly categorize fundus photos of diabetic patients as normal or abnormal, though further refinement of the methodology is needed to improve Turker ratings of the degree of retinopathy. Images were interpreted for a total cost of US $1.10 per eye. Crowdsourcing may offer a novel and inexpensive means to reduce the skilled grader burden and increase screening for diabetic retinopathy.

[1]  Helen K. Li,et al.  Telehealth practice recommendations for diabetic retinopathy, second edition. , 2011, Telemedicine journal and e-health : the official journal of the American Telemedicine Association.

[2]  P F Sharp,et al.  The value of digital imaging in diabetic retinopathy. , 2003, Health technology assessment.

[3]  Frank A Sloan,et al.  Longitudinal rates of annual eye examinations of persons with diabetes and chronic eye diseases. , 2003, Ophthalmology.

[4]  Wansu Chen,et al.  Vision loss among diabetics in a group model Health Maintenance Organization (HMO). , 2002, American journal of ophthalmology.

[5]  Roy W Beck,et al.  The burgeoning public health impact of diabetes: the role of the ophthalmologist. , 2011, Archives of ophthalmology.

[6]  Benjamin M. Good,et al.  The Cure: Design and Evaluation of a Crowdsourcing Game for Gene Selection for Breast Cancer Survival Prediction , 2014, JMIR serious games.

[7]  Duncan J. Watts,et al.  Financial incentives and the "performance of crowds" , 2009, SIGKDD Explor..

[8]  Daren C. Brabham,et al.  Crowdsourcing applications for public health. , 2014, American journal of preventive medicine.

[9]  Lydia B. Chilton,et al.  The labor economics of paid crowdsourcing , 2010, EC '10.

[10]  Kath Bogie,et al.  Crowdsourcing Awareness: Exploration of the Ovarian Cancer Knowledge Gap through Amazon Mechanical Turk , 2014, PloS one.

[11]  Vinu Ilakkuvan,et al.  Cameras for Public Health Surveillance: A Methods Protocol for Crowdsourced Annotation of Point-of-Sale Photographs , 2014, JMIR research protocols.

[12]  Jason T. Reed,et al.  An Exploratory Factor Analysis of Motivations for Participating in Zooniverse, a Collection of Virtual Citizen Science Projects , 2013, 2013 46th Hawaii International Conference on System Sciences.

[13]  Jacki O'Neill,et al.  Being a turker , 2014, CSCW.

[14]  Jonathan Javitt,et al.  Cost-Effectiveness of Detecting and Treating Diabetic Retinopathy , 1996, Annals of Internal Medicine.

[15]  Ronald Klein,et al.  Noncompliance with vision care guidelines in Latinos with type 2 diabetes mellitus: the Los Angeles Latino Eye Study. , 2006, Ophthalmology.

[16]  Miguel Angel Luengo-Oroz,et al.  Crowdsourcing Malaria Parasite Quantification: An Online Game for Analyzing Images of Infected Thick Blood Smears , 2012, Journal of medical Internet research.

[17]  David S Friedman,et al.  Diabetic retinopathy in the developing world: how to approach identifying and treating underserved populations. , 2011, American journal of ophthalmology.

[18]  R. Edwards,et al.  Diabetic retinopathy screening: a systematic review of the economic evidence , 2010, Diabetic medicine : a journal of the British Diabetic Association.

[19]  D M Steinwachs,et al.  Detecting and treating retinopathy in patients with type I diabetes mellitus. A health policy model. , 1990, Ophthalmology.

[20]  M. Swan Crowdsourced Health Research Studies: An Important Emerging Complement to Clinical Trials in the Public Health Research Ecosystem , 2012, Journal of medical Internet research.

[21]  Dinesh Kumar,et al.  Validating retinal fundus image analysis algorithms: issues and a proposal. , 2013, Investigative ophthalmology & visual science.

[22]  P. Scanlon,et al.  Article Commentary: The English national screening programme for sight-threatening diabetic retinopathy , 2008, Journal of medical screening.

[23]  Katrin Kirchhoff,et al.  Using Crowdsourcing Technology for Testing Multilingual Public Health Promotion Materials , 2012, Journal of medical Internet research.

[24]  T. Peto,et al.  Crowdsourcing as a Novel Technique for Retinal Fundus Photography Classification: Analysis of Images in the EPIC Norfolk Cohort on Behalf of the UKBiobank Eye and Vision Consortium , 2013, PloS one.

[25]  K Shotliff,et al.  Diabetic retinopathy: summary of grading and management criteria , 2006 .

[26]  Mancho Ng,et al.  Improving Access to Eye Care: Teleophthalmology in Alberta, Canada , 2009, Journal of diabetes science and technology.

[27]  T. Teng,et al.  Progress towards automated diabetic ocular screening: A review of image analysis and intelligent systems for diabetic retinopathy , 2006, Medical and Biological Engineering and Computing.

[28]  Torleif Halkjelsvik,et al.  Do disgusting and fearful anti-smoking advertisements increase or decrease support for tobacco control policies? , 2014, The International journal on drug policy.

[29]  Erin J Henshaw Too Sick, Not Sick Enough?: Effects of Treatment Type and Timing on Depression Stigma , 2014, The Journal of nervous and mental disease.

[30]  John S Brownstein,et al.  Crowdsourcing Black Market Prices For Prescription Opioids , 2013, Journal of medical Internet research.

[31]  M. H. Quenouille NOTES ON BIAS IN ESTIMATION , 1956 .

[32]  G. Quellec,et al.  Automated analysis of retinal images for detection of referable diabetic retinopathy. , 2013, JAMA ophthalmology.

[33]  M. C. Leske,et al.  Patterns of adherence to diabetes vision care guidelines: baseline findings from the Diabetic Retinopathy Awareness Program. , 2001, Ophthalmology.