Comparing the Similarity of Responses Received from Studies in Amazon’s Mechanical Turk to Studies Conducted Online and with Direct Recruitment

Computer and internet based questionnaires have become a standard tool in Human-Computer Interaction research and other related fields, such as psychology and sociology. Amazon’s Mechanical Turk (AMT) service is a new method of recruiting participants and conducting certain types of experiments. This study compares whether participants recruited through AMT give different responses than participants recruited through an online forum or recruited directly on a university campus. Moreover, we compare whether a study conducted within AMT results in different responses compared to a study for which participants are recruited through AMT but which is conducted using an external online questionnaire service. The results of this study show that there is a statistical difference between results obtained from participants recruited through AMT compared to the results from the participant recruited on campus or through online forums. We do, however, argue that this difference is so small that it has no practical consequence. There was no significant difference between running the study within AMT compared to running it with an online questionnaire service. There was no significant difference between results obtained directly from within AMT compared to results obtained in the campus and online forum condition. This may suggest that AMT is a viable and economical option for recruiting participants and for conducting studies as setting up and running a study with AMT generally requires less effort and time compared to other frequently used methods. We discuss our findings as well as limitations of using AMT for empirical studies.

[1]  P. Ekman,et al.  Constants across cultures in the face and emotion. , 1971, Journal of personality and social psychology.

[2]  Lee Sproull,et al.  Using Electronic Mail for Data Collection in Organizational Research , 1986 .

[3]  Tracy L. Tuten Electronic methods of collecting survey data: a review of 'E-Research' , 1997 .

[4]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[5]  Narcyz Roztocki,et al.  USING INTERNET-BASED SURVEYS FOR ACADEMIC RESEARCH: OPPORTUNITIES AND PROBLEMS , 2001 .

[6]  Aki Vehtari Discussion to "Bayesian measures of model complexity and fit" by Spiegelhalter, D.J., Best, N.G., Carlin, B.P., and van der Linde, A. , 2002 .

[7]  Michael A. Hunter,et al.  Sampling and generalisability in developmental research: Comparison of random and convenience samples of older adults , 2002 .

[8]  Brett Hanscom,et al.  Computerized Questionnaires and the Quality of Survey Data , 2002, Spine.

[9]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[10]  Sabina Kef,et al.  Computer-Assisted Self-Interviewing Tailored for Special Populations and Topics , 2003 .

[11]  D. Hedeker A mixed‐effects multinomial logistic regression model , 2003, Statistics in medicine.

[12]  Scott McCoy,et al.  Electronic versus paper surveys: analysis of potential psychometric biases , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[13]  Mick P. Couper,et al.  Technology Trends in Survey Data Collection , 2005 .

[14]  Ulf-Dietrich Reips,et al.  27 Behavioral Research and Data Collection via the Internet , 2005 .

[15]  Alan Agresti,et al.  Bayesian inference for categorical data analysis , 2005, Stat. Methods Appl..

[16]  Elaine A. Rose,et al.  Pixels vs. Paper: Comparing Online and Traditional Survey Methods in Sport Psychology , 2006 .

[17]  Louise Barkhuus,et al.  From Mice to Men - 24 Years of Evaluation in CHI , 2007, CHI 2007.

[18]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[19]  O. Corneille,et al.  Locating attractiveness in the face space: Faces are more attractive when closer to their group prototype , 2008, Psychonomic bulletin & review.

[20]  Diane Kelly,et al.  Questionnaire mode effects in interactive information retrieval experiments , 2008, Inf. Process. Manag..

[21]  Diane Kelly,et al.  Methods for Evaluating Interactive Information Retrieval Systems with Users , 2009, Found. Trends Inf. Retr..

[22]  Ramakrishnan Mukundan,et al.  Facial caricature generation using a quadratic deformation model , 2009, Advances in Computer Entertainment Technology.

[23]  Duncan J. Watts,et al.  Financial incentives and the "performance of crowds" , 2009, HCOMP '09.

[24]  David G. Rand,et al.  The online laboratory: conducting experiments in a real labor market , 2010, ArXiv.

[25]  Bill Tomlinson,et al.  Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.

[26]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[27]  Jeffrey Heer,et al.  Crowdsourcing graphical perception: using mechanical turk to assess visualization design , 2010, CHI.

[28]  Panagiotis G. Ipeirotis,et al.  Running Experiments on Amazon Mechanical Turk , 2010, Judgment and Decision Making.

[29]  Robert Kosara,et al.  Do Mechanical Turks dream of square pie charts? , 2010, BELIV '10.

[30]  Anne-Marie Croteau,et al.  Employee Reactions to Paper and Electronic Surveys: An Experimental Comparison , 2010, IEEE Transactions on Professional Communication.

[31]  Lydia B. Chilton,et al.  The labor economics of paid crowdsourcing , 2010, EC '10.

[32]  Lois A. Ritter,et al.  Conducting Online Surveys , 2011 .

[33]  Martin Schader,et al.  Exploring task properties in crowdsourcing - an empirical study on mechanical turk , 2011, ECIS.

[34]  David G. Rand,et al.  The promise of Mechanical Turk: how online labor markets can help theorists run behavioral experiments. , 2012, Journal of theoretical biology.

[35]  Amar Cheema,et al.  Data collection in a flat world: the strengths and weaknesses of mechanical turk samples , 2013 .

[36]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[37]  Adam J. Berinsky,et al.  Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk , 2012, Political Analysis.

[38]  Christoph Bartneck,et al.  Agents With Faces - What Can We Learn From LEGO Minifigures? , 2013 .

[39]  Sriram Subramanian,et al.  Talking about tactile experiences , 2013, CHI.

[40]  Katharina Reinecke,et al.  Crowdsourcing performance evaluations of user interfaces , 2013, CHI.

[41]  Petra Kaufmann,et al.  Experimental And Quasi Experimental Designs For Research , 2016 .