Effective Online Controlled Experiment Analysis at Large Scale

Online Controlled Experiments (OCEs) are the norm in data-driven software companies because of the benefits they provide for building and deploying software. Product teams experiment to accurately learn whether the changes that they do to their products (e.g. adding new features) cause any impact (e.g. customers use them more frequently). Experiments also help reduce the risk from deploying software by minimizing the magnitude and duration of harm caused by software bugs, allowing software to be shipped more frequently. To make informed decisions in product development, experiment analysis needs to be granular with a large number of metrics over heterogeneous devices and audiences. Discovering experiment insights by hand, however, can be cumbersome. In this paper, and based on case study research at a large-scale software development company with a long tradition of experimentation, we (1) describe the standard process of experiment analysis, and (2) introduce an artifact to improve the effectiveness and comprehensiveness of this process.

[1]  Heng Li,et al.  Which log level should developers choose for a new logging statement? , 2017, Empirical Software Engineering.

[2]  Barry W. Boehm Value-based software engineering: reinventing , 2003, SOEN.

[3]  Zhenyu Zhao,et al.  Online Experimentation Diagnosis and Troubleshooting Beyond AA Validation , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[4]  Ashish Agarwal,et al.  Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.

[5]  Ron Kohavi,et al.  Trustworthy online controlled experiments: five puzzling outcomes explained , 2012, KDD.

[6]  Ron Kohavi,et al.  The Surprising Power of Online Experiments , 2017 .

[7]  Diane Tang,et al.  Focus on the Long-Term: It's better for Users and Business , 2015 .

[8]  Thomas H. Davenport,et al.  How to design smart business experiments , 2009 .

[9]  S D Simon,et al.  Is the randomized clinical trial the gold standard of research? , 2001, Journal of andrology.

[10]  Jan Bosch,et al.  The Benefits of Controlled Experimentation at Scale , 2017, 2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA).

[11]  J. Box R.A. Fisher and the Design of Experiments, 1922–1926 , 1980 .

[12]  Dong Woo Kim,et al.  A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments , 2017, KDD.

[13]  Ron Kohavi,et al.  Seven pitfalls to avoid when running controlled experiments on the web , 2009, KDD.

[14]  Jan Bosch,et al.  The HYPEX Model: From Opinions to Data-Driven Software Development , 2014, Continuous Software Engineering.

[15]  R. Dorf,et al.  The Balanced Scorecard: Translating Strategy Into Action , 1997, Proceedings of the IEEE.

[16]  Miryung Kim,et al.  The Emerging Role of Data Scientists on Software Development Teams , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[17]  Xian Wu,et al.  Measuring Metrics , 2016, CIKM.

[18]  Jan Bosch,et al.  The Evolution of Continuous Experimentation in Software Product Development: From Data to a Data-Driven Organization at Scale , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[19]  Pengchuan Zhang,et al.  Concise Summarization of Heterogeneous Treatment Effect Using Total Variation Regularized Regression , 2016, 1610.03917.

[20]  Ron Kohavi,et al.  Online Controlled Experiments and A / B Tests , 2015 .

[21]  Per Runeson,et al.  Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[22]  Steven M. Drucker,et al.  The Bones of the System: A Case Study of Logging and Telemetry at Microsoft , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[23]  Alex Deng,et al.  Trustworthy Analysis of Online A/B Tests: Pitfalls, challenges and solutions , 2017, WSDM.

[24]  Jürgen Münch,et al.  Raising the odds of success: the current state of experimentation in product development , 2016, Inf. Softw. Technol..

[25]  Jeffrey T. Hancock,et al.  Experimental evidence of massive-scale emotional contagion through social networks , 2014, Proceedings of the National Academy of Sciences.

[26]  Newton M. Campos The Lean Startup: How today's entrepreneurs use continuous innovation to create radically successful businesses , 2014 .

[27]  Michael S. Bernstein,et al.  Designing and deploying online field experiments , 2014, WWW.

[28]  Pekka Abrahamsson,et al.  Feature Usage as a Value Indicator for Decision Making , 2014, 2014 23rd Australian Software Engineering Conference.

[29]  Jürgen Münch,et al.  Continuous Experimentation in the B2B Domain: A Case Study , 2014, 2015 IEEE/ACM 2nd International Workshop on Rapid Continuous Software Engineering.

[30]  Lukas Vermeer,et al.  Democratizing online controlled experiments at Booking.com , 2017, ArXiv.