LabintheWild: Conducting Large-Scale Online Experiments With Uncompensated Samples

Web-based experimentation with uncompensated and unsupervised samples has the potential to support the replication, verification, extension and generation of new results with larger and more diverse sample populations than previously seen. We introduce the experimental online platform LabintheWild, which provides participants with personalized feedback in exchange for participation in behavioral studies. In comparison to conventional in-lab studies, LabintheWild enables the recruitment of participants at larger scale and from more diverse demographic and geographic backgrounds. We analyze Google Analytics data, participants' comments, and tweets to discuss how participants hear about the platform, and why they might choose to participate. Analyzing three example experiments, we additionally show that these experiments replicate previous in-lab study results with comparable data quality.

[1]  J. Arnett The neglected 95%: why American psychology needs to become less American. , 2008, The American psychologist.

[2]  Katharina Reinecke,et al.  Quantifying visual preferences around the world , 2014, CHI.

[3]  D. Meyer,et al.  Supporting Online Material Materials and Methods Som Text Figs. S1 to S6 References Evidence for a Collective Intelligence Factor in the Performance of Human Groups , 2022 .

[4]  Katharina Reinecke,et al.  Crowdsourcing performance evaluations of user interfaces , 2013, CHI.

[5]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[6]  J. P. Cavanagh Relation between the immediate memory span and the memory search rate. , 1972 .

[7]  L. Festinger A Theory of Social Comparison Processes , 1954 .

[8]  Brent Simpson,et al.  Emotional reactions to losing explain gender differences in entering a risky lottery , 2010, Judgment and Decision Making.

[9]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[10]  Katharina Reinecke,et al.  Doodle around the world: online scheduling behavior reflects cultural differences in time perception and group decision-making , 2013, CSCW.

[11]  Daniel M. Oppenheimer,et al.  Instructional Manipulation Checks: Detecting Satisficing to Increase Statistical Power , 2009 .

[12]  David G. Rand,et al.  The online laboratory: conducting experiments in a real labor market , 2010, ArXiv.

[13]  Lorrie Faith Cranor,et al.  Are your participants gaming the system?: screening mechanical turk workers , 2010, CHI.

[14]  Dana Chandler,et al.  Preventing Satisficing in Online Surveys: A "Kapcha" to Ensure Higher Quality Data , 2010 .

[15]  K. Nakayama,et al.  Is the Web as good as the lab? Comparable performance from Web and lab in cognitive/perceptual experiments , 2012, Psychonomic Bulletin & Review.

[16]  C. B. Colby The weirdest people in the world , 1973 .

[17]  M. W. Kristofferson Effects of practice on character-classification performance. , 1972 .

[18]  J. P. Cavanagh,et al.  The equivalence of target and nontarget processing in visual search , 1971 .

[19]  Amar Cheema,et al.  Data collection in a flat world: the strengths and weaknesses of mechanical turk samples , 2013 .

[20]  M. Manosevitz High-Speed Scanning in Human Memory , .

[21]  S. Sternberg Two operations in character recognition: Some evidence from reaction-time measurements , 1967 .

[22]  Panagiotis G. Ipeirotis,et al.  Running Experiments on Amazon Mechanical Turk , 2010, Judgment and Decision Making.

[24]  S. Levinson,et al.  WEIRD languages have misled us, too , 2010, Behavioral and Brain Sciences.

[25]  M. Orne Demand Characteristics and the Concept of Quasi-Controls1 , 2009 .

[26]  Robert C. Calfee,et al.  Modality and similarity effects in short-term recognition memory. , 1969 .

[27]  Duncan J. Watts,et al.  Cooperation and Contagion in Networked Public Goods Experiments , 2010, ArXiv.

[28]  Jeffrey Heer,et al.  Crowdsourcing graphical perception: using mechanical turk to assess visualization design , 2010, CHI.

[29]  R. Nickerson Response times with a memory-dependent decision task. , 1966, Journal of experimental psychology.

[30]  S. Baron-Cohen,et al.  The "Reading the Mind in the Eyes" Test revised version: a study with normal adults, and adults with Asperger syndrome or high-functioning autism. , 2001, Journal of child psychology and psychiatry, and allied disciplines.

[31]  Gitte Lindgaard,et al.  Attention web designers: You have 50 milliseconds to make a good first impression! , 2006, Behav. Inf. Technol..

[32]  Panagiotis G. Ipeirotis Demographics of Mechanical Turk , 2010 .

[33]  Panagiotis G. Ipeirotis Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.

[34]  Charles Clifton,et al.  Recoding strategies and the retrieval of information from memory , 1973 .

[35]  S. Sternberg Memory Scanning: New Findings and Current Controversies , 1975 .

[36]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[37]  David G. Rand,et al.  The promise of Mechanical Turk: how online labor markets can help theorists run behavioral experiments. , 2012, Journal of theoretical biology.

[38]  Adam J. Berinsky,et al.  Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk , 2012, Political Analysis.

[39]  L. Corbin,et al.  Effect of a simple experimental control: The recall constraint in Sternberg's memory scanning task , 2008 .

[40]  Aaron D. Shaw,et al.  Designing incentives for inexpert human raters , 2011, CSCW.