CUNY Academic

Amazon Mechanical Turk (AMT) is an online crowdsourcing service where anonymous online workers complete web-based tasks for small sums of money. The service has attracted attention from experimental psychologists interested in gathering human subject data more efficiently. However, relative to traditional laboratory studies, many aspects of the testing environment are not under the experimenter's control. In this paper, we attempt to empirically evaluate the fidelity of the AMT system for use in cognitive behavioral experiments. These types of experiment differ from simple surveys in that they require multiple trials, sustained attention from participants, comprehension of complex instructions, and millisecond accuracy for response recording and stimulus presentation. We replicate a diverse body of tasks from experimental psychology including the Stroop, Switching, Flanker, Simon, Posner Cuing, attentional blink, subliminal priming, and category learning tasks using participants recruited using AMT. While most of replications were qualitatively successful and validated the approach of collecting data anonymously online using a web-browser, others revealed disparity between laboratory results and online results. A number of important lessons were encountered in the process of conducting these replications that should be of value to other researchers.

[1]  Andrea M Philipp,et al.  Control and interference in task switching--a review. , 2010, Psychological bulletin.

[2]  Duncan J. Watts,et al.  Financial incentives and the "performance of crowds" , 2009, HCOMP '09.

[3]  Panagiotis G. Ipeirotis,et al.  Running Experiments on Amazon Mechanical Turk , 2010, Judgment and Decision Making.

[4]  J. Ridley Studies of Interference in Serial Verbal Reactions , 2001 .

[5]  Colin M. Macleod Half a century of research on the Stroop effect: an integrative review. , 1991, Psychological bulletin.

[6]  Gordon D. Logan,et al.  Stroop-Type Interference : Congruity Effects in Color Naming With Typewritten Responses , 1998 .

[7]  D. Casasanto,et al.  The QWERTY Effect: How typing shapes the meanings of words. , 2012, Psychonomic bulletin & review.

[8]  Raymond Klein,et al.  Inhibition of return , 2000, Trends in Cognitive Sciences.

[9]  C. Eriksen,et al.  Effects of noise letters upon the identification of a target letter in a nonsearch task , 1974 .

[10]  R. Nosofsky,et al.  Comparing modes of rule-based classification learning: A replication and extension of Shepard, Hovland, and Jenkins (1961) , 1994, Memory & cognition.

[11]  D. Medin,et al.  SUSTAIN: a network model of category learning. , 2004, Psychological review.

[12]  J. Lupiáñez,et al.  Does IOR occur in discrimination tasks? Yes, it does, but later , 1997, Perception & psychophysics.

[13]  K. Nakayama,et al.  Is the Web as good as the lab? Comparable performance from Web and lab in cognitive/perceptual experiments , 2012, Psychonomic Bulletin & Review.

[14]  Neil Stewart,et al.  Adobe Flash as a medium for online experimentation: A test of reaction time measurement capabilities , 2007, Behavior research methods.

[15]  Duncan J. Watts,et al.  Cooperation and Contagion in Web-Based, Networked Public Goods Experiments , 2010, SECO.

[16]  Rolf Ulrich,et al.  Time resolution of clocks: Effects on reaction time measurement—Good news for bad clocks , 1989 .

[17]  Daniel M. Oppenheimer,et al.  Instructional Manipulation Checks: Detecting Satisficing to Increase Statistical Power , 2009 .

[18]  S. Gosling,et al.  Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires. , 2004, The American psychologist.

[19]  S. Lewandowsky Working memory capacity and categorization: individual differences and modeling. , 2011, Journal of experimental psychology. Learning, memory, and cognition.

[20]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[21]  David G. Rand,et al.  Economic Games on the Internet: The Effect of $1 Stakes , 2011, PloS one.

[22]  Ian Neath,et al.  Response time accuracy in Apple Macintosh computers , 2011, Behavior research methods.

[23]  Martin Eimer,et al.  Links between conscious awareness and response inhibition: Evidence from masked priming , 2002, Psychonomic bulletin & review.

[24]  R. Proctor,et al.  Processing irrelevant location information: Practice and transfer effects in choice-reaction tasks , 1999, Memory & cognition.

[25]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[26]  M. Eimer,et al.  Effects of masked stimuli on motor activation: behavioral and electrophysiological evidence. , 1998, Journal of experimental psychology. Human perception and performance.

[27]  Ulf-Dietrich Reips Standards for Internet-based experimenting. , 2002, Experimental psychology.

[28]  Victor Kuperman,et al.  Using Amazon Mechanical Turk for linguistic research , 2010 .

[29]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[30]  Bill Tomlinson,et al.  Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.

[31]  P. Jolicoeur,et al.  A Solution to the Effect of Sample Size on Outlier Elimination , 1994 .

[32]  M. Posner,et al.  Components of visual orienting , 1984 .

[33]  Kimery R. Levering,et al.  Journal of Experimental Psychology: Learning, Memory, and Cognition , 2012 .

[34]  Aaron B. Hoffman,et al.  Eyetracking and selective attention in category learning , 2005, Cognitive Psychology.

[35]  K L Shapiro,et al.  Temporary suppression of visual processing in an RSVP task: an attentional blink? . , 1992, Journal of experimental psychology. Human perception and performance.

[36]  S. Reimers,et al.  Task switching across the life span: effects of age on general and specific switch costs. , 2005, Developmental psychology.

[37]  Kimron Shapiro,et al.  Attentional blink , 2009, Scholarpedia.

[38]  S. Monsell Task-set reconfiguration processes do not imply a control homunuculus: Reply to Altmann , 2003, Trends in Cognitive Sciences.

[39]  B. Love Comparing supervised and unsupervised category learning , 2002, Psychonomic bulletin & review.

[40]  Mike Wendt,et al.  Conflict adaptation in time: Foreperiods as contextual cues for attentional adjustment , 2011, Psychonomic bulletin & review.

[41]  G. D. Logan Task Switching , 2022 .

[42]  C. Eriksen,et al.  The flankers task and response competition: A useful tool for investigating a variety of cognitive problems , 1995 .

[43]  J R Simon,et al.  Processing symbolic information from a visual display: interference from an irrelevant directional cue. , 1970, Journal of experimental psychology.

[44]  J. Kruschke,et al.  ALCOVE: an exemplar-based connectionist model of category learning. , 1992, Psychological review.

[45]  A. Jersild Mental set and shift , 2011 .

[46]  R. Proctor,et al.  The influence of irrelevant location information on performance: A review of the Simon and spatial Stroop effects , 1995, Psychonomic bulletin & review.

[47]  R. Shepard,et al.  Learning and memorization of classifications. , 1961 .