Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk

We examine the trade-offs associated with using Amazon.com’s Mechanical Turk (MTurk) interface for subject recruitment. We first describe MTurk and its promise as a vehicle for performing low-cost and easy-to-field experiments. We then assess the internal and external validity of experiments performed using MTurk, employing a framework that can be used to evaluate other subject pools. We first investigate the characteristics of samples drawn from the MTurk population. We show that respondents recruited in this manner are often more representative of the U.S. population than in-person convenience samples—the modal sample in published experimental political science—but less representative than subjects in Internet-based panels or national probability samples. Finally, we replicate important published experimental work using MTurk samples.

[1]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[2]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[3]  M. Orne On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. , 1962 .

[4]  D. O. Sears College sophomores in the laboratory: Influences of a narrow data base on social psychology's view of human nature. , 1986 .

[5]  Cindy D. Kam,et al.  Risk Orientations and Policy Frames , 2010, The Journal of Politics.

[6]  Kazuhisa Takemura,et al.  Influence of Elaboration on the Framing of Decision , 1994 .

[7]  D. Green,et al.  Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees , 2012 .

[8]  Cindy D. Kam,et al.  Cambridge Handbook of Experimental Political Science: Students as Experimental Participants , 2011 .

[9]  Lydia B. Chilton,et al.  The labor economics of paid crowdsourcing , 2010, EC '10.

[10]  S. Gosling,et al.  Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires. , 2004, The American psychologist.

[11]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[12]  Duncan J. Watts,et al.  Financial incentives and the "performance of crowds" , 2009, HCOMP '09.

[13]  Dana Chandler,et al.  Breaking Monotony with Meaning: Motivation in Crowdsourcing Markets , 2012, ArXiv.

[14]  Anton Kühberger,et al.  The Framing of Decisions: A New Look at Old Problems , 1995 .

[15]  Gabriel S. Lenz,et al.  Looking Like a Winner: Candidate Appearance and Electoral Success in New Democracies , 2010 .

[16]  A. Tversky,et al.  The framing of decisions and the psychology of choice. , 1981, Science.

[17]  James N. Druckman,et al.  Dynamic Public Opinion: Communication Effects over Time , 2010, American Political Science Review.

[18]  John H. Aldrich,et al.  Treatment Spillover Effects across Survey Experiments , 2009, Political Analysis.

[19]  Cindy D. Kam,et al.  Students as Experimental Participants: A Defense of the "Narrow Data Base" (WP-09-05) , 2009 .

[20]  Cindy D. Kam,et al.  Beyond the “Narrow Data Base”: Another Convenience Sample for Experimental Research , 2007 .

[21]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[22]  Bill Tomlinson,et al.  Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.

[23]  David G. Rand,et al.  The online laboratory: conducting experiments in a real labor market , 2010, ArXiv.

[24]  Lorrie Faith Cranor,et al.  Are your participants gaming the system?: screening mechanical turk workers , 2010, CHI.

[25]  Daniel L. Chen,et al.  The Wages of Pay Cuts: Evidence from a Field Experiment , 2009 .

[26]  K. Rasinski,et al.  THE EFFECT OF QUESTION WORDING ON PUBLIC SUPPORT FOR GOVERNMENT SPENDING , 1989 .

[27]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[28]  James N. Druckman,et al.  Evaluating framing effects , 2001 .

[29]  J. Shanteau,et al.  An information processing view of framing effects: The role of causal schemas in decision making , 1996, Memory & cognition.

[30]  Panagiotis G. Ipeirotis,et al.  Running Experiments on Amazon Mechanical Turk , 2010, Judgment and Decision Making.

[31]  A. Todorov,et al.  Predicting political elections from rapid and unreflective face judgments , 2007, Proceedings of the National Academy of Sciences.

[32]  James G. Gimpel,et al.  How Large and Long-lasting Are the Persuasive Effects of Televised Campaign Ads? Results from a Randomized Field Experiment , 2011, American Political Science Review.

[33]  Adam J. Berinsky,et al.  Making Sense of Issues Through Media Frames: Understanding the Kosovo Crisis , 2006, The Journal of Politics.

[34]  Tilmann Betsch,et al.  Framing the framing effect: the impact of context cues on solutions to the ‘Asian disease’ problem , 1998 .