Who's the Guinea Pig?: Investigating Online A/B/n Tests in-the-Wild

A/B/n testing has been adopted by many technology companies as a data-driven approach to product design and optimization. These tests are often run on their websites without explicit consent from users. In this paper, we investigate such online A/B/n tests by using Optimizely as a lens. First, we provide measurement results of 575 websites that use Optimizely drawn from the Alexa Top-1M, and analyze the distributions of their audiences and experiments. Then, we use three case studies to discuss potential ethical pitfalls of such experiments, including involvement of political content, price discrimination, and advertising campaigns. We conclude with a suggestion for greater awareness of ethical concerns inherent in human experimentation and a call for increased transparency among A/B/n test operators.

[1]  D. Zucker The Belmont Report , 2014 .

[2]  Ron Kohavi,et al.  A/B Testing at Scale: Accelerating Software Innovation , 2017, SIGIR.

[3]  Justin M. Rao,et al.  Fair and Balanced? Quantifying Media Bias through Crowdsourced Content Analysis , 2016 .

[4]  Raquel Benbunan-Fich,et al.  The ethics of online research with unsuspecting users: From A/B testing to C/D experimentation , 2017 .

[5]  Ullrich K. H. Ecker,et al.  The effects of subtle misinformation in news headlines. , 2014, Journal of experimental psychology. Applied.

[6]  Matthias Hagen,et al.  Crowdsourcing a Large Corpus of Clickbait on Twitter , 2018, COLING.

[7]  Christo Wilson,et al.  Linguistic Signals under Misinformation and Fact-Checking , 2018, Proc. ACM Hum. Comput. Interact..

[8]  Michael Wolfe,et al.  STATE TECHNOLOGY AND SCIENCE INDEX , 2013 .

[9]  R. Stott,et al.  The World Bank , 2008, Annals of tropical medicine and parasitology.

[10]  Justin M. Rao,et al.  Filter Bubbles, Echo Chambers, and Online News Consumption , 2016 .

[11]  Robert Karl,et al.  Holistic configuration management at Facebook , 2015, SOSP.

[12]  Krishna P. Gummadi,et al.  Potential for Discrimination in Online Targeted Advertising , 2018, FAT.

[13]  Ron Kohavi,et al.  Online controlled experiments at large scale , 2013, KDD.

[14]  Jeffrey T. Hancock,et al.  Experimental evidence of massive-scale emotional contagion through social networks , 2014, Proceedings of the National Academy of Sciences.

[15]  Christo Wilson,et al.  Diffusion of User Tracking Data in the Online Advertising Ecosystem , 2018, Proc. Priv. Enhancing Technol..

[16]  Ashish Agarwal,et al.  Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.

[17]  Vijay Erramilli,et al.  Crowd-assisted search for price discrimination in e-commerce: first results , 2013, CoNEXT.

[18]  Ron Kohavi,et al.  Seven rules of thumb for web site experimenters , 2014, KDD.

[19]  Ron Kohavi,et al.  Trustworthy online controlled experiments: five puzzling outcomes explained , 2012, KDD.

[20]  Ron Kohavi,et al.  Practical guide to controlled experiments on the web: listen to your customers not to the hippo , 2007, KDD '07.

[21]  Barbara Chamberlain,et al.  The Office of Human Research Protections , 2008, Clinical nurse specialist CNS.

[22]  Krishna P. Gummadi,et al.  Media Bias Monitor: Quantifying Biases of Social Media News Outlets at Large-Scale , 2018, ICWSM.

[23]  Anmol Bhasin,et al.  From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks , 2015, KDD.

[24]  Loren G. Terveen,et al.  Towards a Geographic Understanding of the Sharing Economy: Systemic Biases in UberX and TaskRabbit , 2022 .

[25]  Christo Wilson,et al.  On Ridesharing Competition and Accessibility: Evidence from Uber, Lyft, and Taxi , 2018, WWW.

[26]  Christo Wilson,et al.  Quantity vs. Quality: Evaluating User Interest Profiles Using Ad Preference Managers , 2019, NDSS.

[27]  Laurie A. Williams,et al.  Characterizing Experimentation in Continuous Deployment: A Case Study on Bing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[28]  J. Vandello,et al.  Prevalence of Rape Myths in Headlines and Their Effects on Attitudes Toward Rape , 2008 .

[29]  Michael Carl Tschantz,et al.  Automated Experiments on Ad Privacy Settings , 2014, Proc. Priv. Enhancing Technol..

[30]  Michael Carl Tschantz,et al.  Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination , 2014, ArXiv.

[31]  Tim Weninger,et al.  Consumers and Curators: Browsing and Voting Patterns on Reddit , 2017, IEEE Transactions on Computational Social Systems.

[32]  James Grimmelmann,et al.  The Law and Ethics of Experiments on Social Media Users , 2015 .

[33]  Michael S. Bernstein,et al.  Designing and deploying online field experiments , 2014, WWW.

[34]  Eli Pariser,et al.  The Filter Bubble: How the New Personalized Web Is Changing What We Read and How We Think , 2012 .

[35]  David Lazer,et al.  Suppressing the Search Engine Manipulation Effect (SEME) , 2017, Proc. ACM Hum. Comput. Interact..

[36]  Ron Kohavi,et al.  Controlled experiments on the web: survey and practical guide , 2009, Data Mining and Knowledge Discovery.

[37]  Ya Xu,et al.  Evaluating Mobile Apps with A/B and Quasi A/B Tests , 2016, KDD.

[38]  Eli Pariser FILTER BUBBLE: Wie wir im Internet entmündigt werden , 2012 .

[39]  Ron Kohavi,et al.  Unexpected results in online controlled experiments , 2011, SKDD.

[40]  Julius Daugbjerg Bjerrekær,et al.  The OKCupid dataset: A very large public dataset of dating site users , 2016 .

[41]  Christo Wilson,et al.  Tracing Information Flows Between Ad Exchanges Using Retargeted Ads , 2018, USENIX Security Symposium.

[42]  Niloy Ganguly,et al.  Stop Clickbait: Detecting and preventing clickbaits in online news media , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[43]  Latanya Sweeney,et al.  Discrimination in online ad delivery , 2013, CACM.

[44]  Ronald E. Robertson,et al.  The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections , 2015, Proceedings of the National Academy of Sciences.

[45]  Marcus A. Bellamy,et al.  Surge Pricing on a Service Platform Under Spatial Spillovers: Evidence From Uber , 2018, Academy of Management Proceedings.

[46]  Pavel Dmitriev,et al.  The Anatomy of a Large-Scale Experimentation Platform , 2018, 2018 IEEE International Conference on Software Architecture (ICSA).

[47]  Lada A. Adamic,et al.  Exposure to ideologically diverse news and opinion on Facebook , 2015, Science.

[48]  Alex Deng,et al.  Trustworthy Analysis of Online A/B Tests: Pitfalls, challenges and solutions , 2017, WSDM.

[49]  David Lazer,et al.  Measuring Price Discrimination and Steering on E-commerce Web Sites , 2014, Internet Measurement Conference.

[50]  Christo Wilson,et al.  "Recommended For You": A First Look at Content Recommendation Networks , 2016, Internet Measurement Conference.

[51]  Dan Siroker,et al.  A/B Testing: The Most Powerful Way to Turn Clicks Into Customers , 2013 .

[52]  David Lazer,et al.  Auditing Partisan Audience Bias within Google Search , 2018, Proc. ACM Hum. Comput. Interact..

[53]  Foster J. Provost,et al.  Measuring Causal Impact of Online Actions via Natural Experiments: Application to Display Advertising , 2015, KDD.

[54]  Vijay Erramilli,et al.  Detecting price and search discrimination on the internet , 2012, HotNets-XI.