Conducting behavioral research on Amazon’s Mechanical Turk

Amazon’s Mechanical Turk is an online labor market where requesters post jobs and workers choose which jobs to do for pay. The central purpose of this article is to demonstrate how to use this Web site for conducting behavioral research and to lower the barrier to entry for researchers who could benefit from this platform. We describe general techniques that apply to a variety of types of research and experiments across disciplines. We begin by discussing some of the advantages of doing experiments on Mechanical Turk, such as easy access to a large, stable, and diverse subject pool, the low cost of doing experiments, and faster iteration between developing theory and executing experiments. While other methods of conducting behavioral research may be comparable to or even better than Mechanical Turk on one or more of the axes outlined above, we will show that when taken as a whole Mechanical Turk can be a useful tool for many researchers. We will discuss how the behavior of workers compares with that of experts and laboratory subjects. Then we will illustrate the mechanics of putting a task on Mechanical Turk, including recruiting subjects, executing the task, and reviewing the work that was submitted. We also provide solutions to common problems that a researcher might face when executing their research on this platform, including techniques for conducting synchronous experiments, methods for ensuring high-quality work, how to keep data private, and how to maintain code security.

[1]  Samuel D. Gosling,et al.  Advanced Methods for Conducting Online Behavioral Research , 2010 .

[2]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[3]  William C. Schmidt,et al.  Technical considerations when implementing online research , 2009 .

[4]  Kimberly A. Barchard,et al.  Practical advice for conducting ethical online experiments and questionnaires for United States psychologists , 2008, Behavior research methods.

[5]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[6]  Iadh Ounis,et al.  Crowdsourcing a News Query Classification Dataset , 2010 .

[7]  Anja S. Göritz,et al.  The Long-Term Effect of Material Incentives on Participation in Online Panels , 2008 .

[8]  A. Shariff,et al.  God Is Watching You Priming God Concepts Increases Prosocial Behavior in an Anonymous Economic Game , 2007 .

[9]  Michael A. Smith,et al.  Virtual subjects: Using the Internet as an alternative source of subjects and research environment , 1997 .

[10]  Colin Camerer,et al.  The Effects of Financial Incentives in Experiments: A Review and Capital-Labor-Production Framework , 1999 .

[11]  A. Tversky,et al.  Extensional versus intuitive reasoning: the conjunction fallacy in probability judgment , 1983 .

[12]  Carol Peters,et al.  Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation , 2009 .

[13]  Lydia B. Chilton,et al.  Exploring iterative and parallel human computation processes , 2010, HCOMP '10.

[14]  Peter V. Miller,et al.  Web Survey Methods Introduction , 2008 .

[15]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[16]  J. Baron,et al.  Outcome bias in decision evaluation. , 1988, Journal of personality and social psychology.

[17]  David G. Rand,et al.  The promise of Mechanical Turk: how online labor markets can help theorists run behavioral experiments. , 2012, Journal of theoretical biology.

[18]  Bill Tomlinson,et al.  Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.

[19]  and software — performance evaluation , .

[20]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[21]  U D Reips The Web Experimental Psychology Lab: Five years of data collection on the Internet , 2001, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[22]  Damon Centola,et al.  The Spread of Behavior in an Online Social Network Experiment , 2010, Science.

[23]  Ulf-Dietrich Reips Standards for Internet-based experimenting. , 2002, Experimental psychology.

[24]  J. Brady,et al.  The Belmont Report. Ethical principles and guidelines for the protection of human subjects of research. , 2015, The Journal of the American College of Dentists.

[25]  Panagiotis G. Ipeirotis,et al.  Running Experiments on Amazon Mechanical Turk , 2010, Judgment and Decision Making.

[26]  Katelyn Y. A. McKenna,et al.  Oxford Handbook of Internet Psychology , 2007 .

[27]  Ariel D. Procaccia,et al.  Human Computation and Multiagent Systems: An Algorithmic Perspective , 2011 .

[28]  W. J. Dixon,et al.  Processing Data for Outliers , 1953 .

[29]  M. Banaji,et al.  Psychological. , 2015, The journals of gerontology. Series B, Psychological sciences and social sciences.

[30]  J. Heckman Sample selection bias as a specification error , 1979 .

[31]  Alek Felstiner Working the Crowd: Employment and Labor Law in the Crowdsourcing Industry , 2011 .

[32]  Anja S Göritz,et al.  Individual payments as a longer-term incentive in online panels , 2008, Behavior research methods.

[33]  Jochen Musch,et al.  Psychological experimenting on the World-Wide Web : Investigating content effects in syllogistic reasoning , 2002 .

[34]  Brent Simpson,et al.  Emotional reactions to losing explain gender differences in entering a risky lottery , 2010 .

[35]  Ben Carterette,et al.  An Analysis of Assessor Behavior in Crowdsourced Preference Judgments , 2010 .

[36]  M. Birnbaum Human research and data collection via the internet. , 2004, Annual review of psychology.

[37]  E. Fehr,et al.  Cooperation and Punishment in Public Goods Experiments , 1999, SSRN Electronic Journal.

[38]  George A. Akerlof The Market for “Lemons”: Quality Uncertainty and the Market Mechanism , 1970 .

[39]  Ulf-Dietrich Reips Chapter 4 – The Web Experiment Method: Advantages, Disadvantages, and Solutions , 2000 .

[40]  M. Couper A REVIEW OF ISSUES AND APPROACHES , 2000 .

[41]  Lydia B. Chilton,et al.  Task search in a human computation market , 2010, HCOMP '10.

[42]  Matthew J. Salganik,et al.  Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market , 2006, Science.

[43]  Johanna D. Moore,et al.  Proceedings of the Conference on Human Factors in Computing Systems , 1989 .

[44]  R. Berk An introduction to sample selection bias in sociological data. , 1983 .

[45]  Alexander I. Rudnicky,et al.  Using the Amazon Mechanical Turk for transcription of spoken language , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[46]  Stefan Stieger,et al.  The high-hurdle technique put to the test: Failure to find evidence that increasing loading times enhances data quality in Web-based studies , 2008, Behavior research methods.

[47]  J. Morgan,et al.  ...Plus Shipping and Handling: Revenue (Non) Equivalence in Field Experiments on eBay , 2006 .

[48]  Anja S. Göritz,et al.  Incentives in Web Studies: Methodological Issues and a Review , 2006 .

[49]  Jennifer Preece,et al.  Electronic Survey Methodology: A Case Study in Reaching Hard-to-Involve Internet Users , 2003, Int. J. Hum. Comput. Interact..

[50]  Solomon E. Asch,et al.  Studies in the Principles of Judgments and Attitudes: II. Determination of Judgments by Group and by Ego Standards , 1940 .

[51]  Niels Taatgen,et al.  Handbook of human factors in web design , 2005 .

[52]  Brian A. Nosek CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE Implicit–Explicit Relations , 2022 .

[53]  A. Tversky,et al.  The framing of decisions and the psychology of choice. , 1981, Science.

[54]  Ulf-Dietrich Reips,et al.  27 Behavioral Research and Data Collection via the Internet , 2005 .

[55]  Panagiotis G. Ipeirotis Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.

[56]  Panagiotis G. Ipeirotis Demographics of Mechanical Turk , 2010 .

[57]  R. Carnegie Psychological Research Online : Opportunities and Challenges , 2003 .

[58]  A. Tversky,et al.  Extensional versus intuitive reasoning: the conjunction fallacy in probability judgment , 1983 .

[59]  Duncan J. Watts,et al.  Financial incentives and the "performance of crowds" , 2009, HCOMP '09.

[60]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[61]  Duncan J. Watts,et al.  Cooperation and Contagion in Web-Based, Networked Public Goods Experiments , 2010, SECO.

[62]  T. W. Ross,et al.  Cooperation without Reputation: Experimental Evidence from Prisoner's Dilemma Games , 1996 .

[63]  Mónica Marrero,et al.  Crowdsourcing Preference Judgments for Evaluation of Music Similarity Tasks , 2010 .

[64]  M. Hertzman,et al.  Studies in the Principles of Judgments and Attitudes: III. The Functional Equivalence of Two Differently Structured References , 1940 .

[65]  Ulf-Dietrich Reips,et al.  Financial Incentives, Personal Information and Drop Out in Online Studies , 2001 .

[66]  Krzysztof Z. Gajos,et al.  Toward automatic task design: a progress report , 2010, HCOMP '10.

[67]  David Lucking-Reiley,et al.  Using field experiments to test equivalence between auction formats: Magic on the internet , 1999 .