Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas

During the last decade, the information technology industry has adopted a data-driven culture, relying on online metrics to measure and monitor business performance. Under the setting of big data, the majority of such metrics approximately follow normal distributions, opening up potential opportunities to model them directly without extra model assumptions and solve big data problems via closed-form formulas using distributed algorithms at a fraction of the cost of simulation-based procedures like bootstrap. However, certain attributes of the metrics, such as their corresponding data generating processes and aggregation levels, pose numerous challenges for constructing trustworthy estimation and inference procedures. Motivated by four real-life examples in metric development and analytics for large-scale A/B testing, we provide a practical guide to applying the Delta method, one of the most important tools from the classic statistics literature, to address the aforementioned challenges. We emphasize the central role of the Delta method in metric analytics by highlighting both its classic and novel applications.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[3]  D. Krewski Distribution-Free Confidence Intervals for Quantile Intervals , 1976 .

[4]  A Donner,et al.  Statistical methodology for paired cluster designs. , 1987, American journal of epidemiology.

[5]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[6]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[7]  M. Davidian,et al.  Semiparametric Estimation of Treatment Effect in a Pretest-Posttest Study with Missing Data. , 2005, Statistical science : a review journal of the Institute of Mathematical Statistics.

[8]  Yan Zhou,et al.  Distributed support vector machines: An overview , 2012, 2012 24th Chinese Control and Decision Conference (CCDC).

[9]  E. C. Fieller SOME PROBLEMS IN INTERVAL ESTIMATION , 1954 .

[10]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[11]  Yu Guo,et al.  Flexible Online Repeated Measures Experiment , 2015 .

[12]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[13]  Ron Kohavi,et al.  Practical guide to controlled experiments on the web: listen to your customers not to the hippo , 2007, KDD '07.

[14]  Huizhi Xie,et al.  Improving the Sensitivity of Online Controlled Experiments: Case Studies at Netflix , 2016, KDD.

[15]  D. Boos,et al.  How Large Does n Have to be for Z and t Intervals? , 2000 .

[16]  Larry Wasserman,et al.  All of Statistics , 2004 .

[17]  Ron Kohavi,et al.  Improving the sensitivity of online controlled experiments by utilizing pre-experiment data , 2013, WSDM.

[18]  Robert A. Wolfe,et al.  Estimation of the variance of percentile estimates , 1983 .

[19]  Yun Yang,et al.  Communication-Efficient Distributed Statistical Inference , 2016, Journal of the American Statistical Association.

[20]  Alex Deng,et al.  Trustworthy Analysis of Online A/B Tests: Pitfalls, challenges and solutions , 2017, WSDM.

[21]  Susan Athey,et al.  The Econometrics of Randomized Experiments , 2016, 1607.00698.

[22]  Anmol Bhasin,et al.  From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks , 2015, KDD.

[23]  Dean Eckles,et al.  Design and Analysis of Experiments in Networks: Reducing Bias from Interference , 2014, ArXiv.

[24]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[25]  J. Hirschberg,et al.  A Geometric Comparison of the Delta and Fieller Confidence Intervals , 2010 .

[26]  Ashish Agarwal,et al.  Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.

[27]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[28]  A Donner,et al.  Current and future challenges in the design and analysis of cluster randomization trials , 2001, Statistics in medicine.

[29]  Anmol Bhasin,et al.  Network A/B Testing: From Sampling to Estimation , 2015, WWW.

[30]  W. Rudin Principles of mathematical analysis , 1964 .

[31]  Alex Deng,et al.  Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned , 2016, KDD.

[32]  D. Bates,et al.  Linear Mixed-Effects Models using 'Eigen' and S4 , 2015 .

[33]  Ron Kohavi,et al.  Responsible editor: R. Bayardo. , 2022 .

[34]  Jean-Paul Fox,et al.  Modeling of Responses and Response Times with the Package cirt , 2007 .

[35]  Eugene Kharitonov,et al.  Learning Sensitive Combinations of A/B Test Metrics , 2017, WSDM.

[36]  Georgios B. Giannakis,et al.  Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[37]  John S. Meyer Outer and Inner Confidence Intervals for Finite Population Quantile Intervals , 1987 .

[38]  Ron Kohavi,et al.  Online Experimentation at Microsoft , 2009 .

[39]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[40]  George Casella,et al.  Statistical Inference Second Edition , 2007 .

[41]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[42]  Jon M. Kleinberg,et al.  Network bucket testing , 2011, WWW.

[43]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[44]  Xian Wu,et al.  Measuring Metrics , 2016, CIKM.

[45]  Purnamrita Sarkar,et al.  A scalable bootstrap for massive data , 2011, 1112.5016.

[46]  E. C. Fieller The Biological Standardization of Insulin , 1940 .

[47]  Ron Kohavi,et al.  Online Experiments: Practical Lessons , 2010, Computer.

[48]  Aad van der Vaart,et al.  On the Asymptotic Normality of Estimating the Affine Preferential Attachment Network Models with Random Initial Degrees , 2016, 1603.02625.

[49]  Alexey Drutsa,et al.  Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments , 2018, WSDM.

[50]  Ron Kohavi,et al.  Online controlled experiments at large scale , 2013, KDD.

[51]  P. Hall The Bootstrap and Edgeworth Expansion , 1992 .

[52]  Dong Woo Kim,et al.  A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments , 2017, KDD.

[53]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[54]  Ulrike von Luxburg,et al.  A Geometric Approach to Confidence Sets for Ratios: Fieller‘s Theorem, Generalizations, and Bootstrap , 2007, 0711.0198.

[55]  Reynold Xin,et al.  Apache Spark , 2016 .