Systematically evaluating scientific literature is a time consuming endeavor that requires hours of coding and rating. Here, we describe a method to distribute these tasks across a large group through online crowdsourcing. Using Amazon's Mechanical Turk, crowdsourced workers (microworkers) completed four groups of tasks to evaluate the question, “Do nutrition-obesity studies with conclusions concordant with popular opinion receive more attention in the scientific community than do those that are discordant?” 1) Microworkers who passed a qualification test (19% passed) evaluated abstracts to determine if they were about human studies investigating nutrition and obesity. Agreement between the first two raters' conclusions was moderate (κ = 0.586), with consensus being reached in 96% of abstracts. 2) Microworkers iteratively synthesized free-text answers describing the studied foods into one coherent term. Approximately 84% of foods were agreed upon, with only 4 and 8% of ratings failing manual review in different steps. 3) Microworkers were asked to rate the perceived obesogenicity of the synthesized food terms. Over 99% of responses were complete and usable, and opinions of the microworkers qualitatively matched the authors' expert expectations (e.g., sugar-sweetened beverages were thought to cause obesity and fruits and vegetables were thought to prevent obesity). 4) Microworkers extracted citation counts for each paper through Google Scholar. Microworkers reached consensus or unanimous agreement for all successful searches. To answer the example question, data were aggregated and analyzed, and showed no significant association between popular opinion and attention the paper received as measured by Scimago Journal Rank and citation counts. Direct microworker costs totaled $221.75, (estimated cost at minimum wage: $312.61). We discuss important points to consider to ensure good quality control and appropriate pay for microworkers. With good reliability and low cost, crowdsourcing has potential to evaluate published literature in a cost-effective, quick, and reliable manner using existing, easily accessible resources.
[1]
J. R. Landis,et al.
The measurement of observer agreement for categorical data.
,
1977,
Biometrics.
[2]
A. J. Conger.
Integration and generalization of kappas for multiple raters.
,
1980
.
[3]
L. Aday,et al.
Designing and conducting health surveys : a comprehensive guide
,
2006
.
[4]
Chris Callison-Burch,et al.
Creating Speech and Language Data With Amazon’s Mechanical Turk
,
2010,
Mturk@HLT-NAACL.
[5]
Anne-Wil Harzing,et al.
A preliminary test of Google Scholar as a source for citation data: a longitudinal study of Nobel prize winners
,
2013,
Scientometrics.
[6]
Isabelle Boutron,et al.
Misrepresentation of Randomized Controlled Trials in Press Releases and News Coverage: A Cohort Study
,
2012,
PLoS medicine.
[7]
D. Allison,et al.
Belief beyond the evidence: using the proposed effect of breakfast on obesity to show 2 practices that distort scientific evidence.
,
2013,
The American journal of clinical nutrition.
[8]
Mark A. Musen,et al.
Crowdsourcing the Verification of Relationships in Biomedical Ontologies
,
2013,
AMIA.
[9]
Pietro Perona,et al.
Sleep spindle detection: crowdsourcing and evaluating performance of experts, non-experts, and automated methods
,
2014,
Nature Methods.