Cameras for Public Health Surveillance: A Methods Protocol for Crowdsourced Annotation of Point-of-Sale Photographs

Background Photographs are an effective way to collect detailed and objective information about the environment, particularly for public health surveillance. However, accurately and reliably annotating (ie, extracting information from) photographs remains difficult, a critical bottleneck inhibiting the use of photographs for systematic surveillance. The advent of distributed human computation (ie, crowdsourcing) platforms represents a veritable breakthrough, making it possible for the first time to accurately, quickly, and repeatedly annotate photos at relatively low cost. Objective This paper describes a methods protocol, using photographs from point-of-sale surveillance studies in the field of tobacco control to demonstrate the development and testing of custom-built tools that can greatly enhance the quality of crowdsourced annotation. Methods Enhancing the quality of crowdsourced photo annotation requires a number of approaches and tools. The crowdsourced photo annotation process is greatly simplified by decomposing the overall process into smaller tasks, which improves accuracy and speed and enables adaptive processing, in which irrelevant data is filtered out and more difficult targets receive increased scrutiny. Additionally, zoom tools enable users to see details within photographs and crop tools highlight where within an image a specific object of interest is found, generating a set of photographs that answer specific questions. Beyond such tools, optimizing the number of raters (ie, crowd size) for accuracy and reliability is an important facet of crowdsourced photo annotation. This can be determined in a systematic manner based on the difficulty of the task and the desired level of accuracy, using receiver operating characteristic (ROC) analyses. Usability tests of the zoom and crop tool suggest that these tools significantly improve annotation accuracy. The tests asked raters to extract data from photographs, not for the purposes of assessing the quality of that data, but rather to assess the usefulness of the tool. The proportion of individuals accurately identifying the presence of a specific advertisement was higher when provided with pictures of the product’s logo and an example of the ad, and even higher when also provided the zoom tool (χ2 2=155.7, P<.001). Similarly, when provided cropped images, a significantly greater proportion of respondents accurately identified the presence of cigarette product ads (χ2 1=75.14, P<.001), as well as reported being able to read prices (χ2 2=227.6, P<.001). Comparing the results of crowdsourced photo-only assessments to traditional field survey data, an excellent level of correspondence was found, with area under the ROC curves produced by sensitivity analyses averaging over 0.95, requiring on average 10 to 15 crowdsourced raters to achieve values of over 0.90. Results Further testing and improvement of these tools and processes is currently underway. This includes conducting systematic evaluations that crowdsource photograph annotation and methodically assess the quality of raters’ work. Conclusions Overall, the combination of crowdsourcing technologies with tiered data flow and tools that enhance annotation quality represents a breakthrough solution to the problem of photograph annotation, vastly expanding opportunities for the use of photographs rich in public health and other data on a scale previously unimaginable.

[1]  Daren C. Brabham,et al.  Crowdsourcing applications for public health. , 2014, American journal of preventive medicine.

[2]  M. Pepe The Statistical Evaluation of Medical Tests for Classification and Prediction , 2003 .

[3]  Cyrus Rashtchian,et al.  Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[4]  Mihaela van der Schaar,et al.  Reputation-based incentive protocols in crowdsourcing applications , 2011, 2012 Proceedings IEEE INFOCOM.

[5]  Deepti Adlakha,et al.  Emerging technologies: webcams and crowd-sourcing to identify active transportation. , 2013, American journal of preventive medicine.

[6]  Krzysztof Z. Gajos,et al.  Platemate: crowdsourcing nutritional analysis from food photographs , 2011, UIST.

[7]  Cathal Gurrin,et al.  The smartphone as a platform for wearable cameras in health research. , 2013, American journal of preventive medicine.

[8]  Deborah Estrin,et al.  Internet Predictions , 2010, IEEE Internet Comput..

[9]  Jennifer L Pearson,et al.  Rapid Grading of Fundus Photographs for Diabetic Retinopathy Using Crowdsourcing , 2014, Journal of medical Internet research.

[10]  Thomas R. Kirchner,et al.  Rapid grading of fundus photos for diabetic retinopathy using crowdsourcing , 2014 .

[11]  M. H. Quenouille NOTES ON BIAS IN ESTIMATION , 1956 .

[12]  Lydia B. Chilton,et al.  The labor economics of paid crowdsourcing , 2010, EC '10.

[13]  Miguel Angel Luengo-Oroz,et al.  Crowdsourcing Malaria Parasite Quantification: An Online Game for Analyzing Images of Infected Thick Blood Smears , 2012, Journal of medical Internet research.

[14]  Huiji Gao,et al.  Harnessing the Crowdsourcing Power of Social Media for Disaster Relief , 2011, IEEE Intelligent Systems.

[15]  Scott Duncan,et al.  Using global positioning systems in health research: a practical approach to data collection and processing. , 2011, American journal of preventive medicine.

[16]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.

[17]  Jennifer L Pearson,et al.  Marketing little cigars and cigarillos: advertising, price, and associations with neighborhood demographics. , 2013, American journal of public health.

[18]  Caroline C. Wang,et al.  Photovoice: Concept, Methodology, and Use for Participatory Needs Assessment , 1997, Health education & behavior : the official publication of the Society for Public Health Education.

[19]  Elizabeth Selvin,et al.  Performance of A1C for the Classification and Prediction of Diabetes , 2010, Diabetes Care.

[20]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[21]  Thomas R. Kirchner,et al.  Geospatial exposure to point-of-sale tobacco: real-time craving and smoking-cessation outcomes. , 2013, American journal of preventive medicine.