A/B Testing at Scale: Accelerating Software Innovation

The Internet and the general digitalization of products and operations provides an unprecedented opportunity to accelerate innovation while applying a rigorous and trustworthy methodology for supporting key product decisions. Developers of connected software, including web sites, applications, and devices, can now evaluate ideas quickly and accurately using controlled experiments, also known as A/B tests. From front-end user-interface changes to backend algorithms, from search engines (e.g., Google, Bing, Yahoo!) to retailers (e.g., Amazon, eBay, Etsy) to social networking services (e.g., Facebook, LinkedIn, Twitter) to travel services (e.g., Expedia, Airbnb, Booking.com) to many startups, online controlled experiments are now utilized to make data-driven decisions at a wide range of companies. The theory of a controlled experiment is simple, but for the practitioner the deployment and evaluation of online controlled experiments at scale (100’s of concurrently running experiments) across a variety of web sites, mobile apps, and desktop applications presents many pitfalls and new research challenges. In this tutorial, we will introduce the overall A/B testing methodology, walkthrough use cases using real examples, and then focus on practical and research challenges in scaling experimentation. We will share key lessons learned from scaling experimentation at Microsoft to thousands of experiments per year and outline promising directions for future work.

[1]  Alex Deng,et al.  Continuous Monitoring of A/B Tests without Pain: Optional Stopping in Bayesian Testing , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[2]  Ron Kohavi,et al.  A/B Testing at Scale: Accelerating Software Innovation , 2017, SIGIR.

[3]  Ron Kohavi Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 Years , 2015, KDD.

[4]  Alex Deng,et al.  Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned , 2016, KDD.

[5]  Georg Buscher,et al.  Principles for the Design of Online A/B Metrics , 2016, SIGIR.

[6]  Alex Deng,et al.  Objective Bayesian Two Sample Hypothesis Testing for Online Controlled Experiments , 2015, WWW.

[7]  Xian Wu,et al.  Measuring Metrics , 2016, CIKM.

[8]  Jan Bosch,et al.  The Evolution of Continuous Experimentation in Software Product Development: From Data to a Data-Driven Organization at Scale , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[9]  Pengchuan Zhang,et al.  Concise Summarization of Heterogeneous Treatment Effect Using Total Variation Regularized Regression , 2016, 1610.03917.

[10]  Ron Kohavi,et al.  Pitfalls of long-term online controlled experiments , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[11]  Ron Kohavi,et al.  Improving the sensitivity of online controlled experiments by utilizing pre-experiment data , 2013, WSDM.

[12]  Ron Kohavi,et al.  Online controlled experiments at large scale , 2013, KDD.

[13]  Zhenyu Zhao,et al.  Online Experimentation Diagnosis and Troubleshooting Beyond AA Validation , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[14]  Ron Kohavi,et al.  Trustworthy online controlled experiments: five puzzling outcomes explained , 2012, KDD.