Concise Summarization of Heterogeneous Treatment Effect Using Total Variation Regularized Regression

Randomized controlled experiment has long been accepted as the golden standard for establishing causal link and estimating causal effect in various scientific fields. Average treatment effect is often used to summarize the effect estimation, even though treatment effects are commonly believed to be varying among individuals. In the recent decade with the availability of "big data", more and more experiments have large sample size and increasingly rich side information that enable and require experimenters to discover and understand heterogeneous treatment effect (HTE). There are two aspects in HTE understanding, one is to predict the effect conditioned on a given set of side information or a given individual, the other is to interpret the HTE structure and summarize it in a memorable way. The former aspect can be treated as a regression problem, and the latter aspect focuses on concise summarization and interpretation. In this paper we propose a method that can achieve both at the same time. This method can be formulated as a convex optimization problem, for which we provide stable and scalable implementation.

[1]  David H. Reiley,et al.  Here, there, and everywhere: correlated online behaviors can lead to overestimates of the effects of advertising , 2011, WWW.

[2]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[3]  Ashish Agarwal,et al.  Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.

[4]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  J. S. Hunter,et al.  Statistics for Experimenters: Design, Innovation, and Discovery , 2006 .

[8]  Jieping Ye,et al.  Feature grouping and selection over an undirected graph , 2012, KDD.

[9]  Rong Ge,et al.  Evaluating online ad campaigns in a pipeline: causal models at scale , 2010, KDD.

[10]  S. Osher,et al.  IMAGE DECOMPOSITION AND RESTORATION USING TOTAL VARIATION MINIMIZATION AND THE H−1 NORM∗ , 2002 .

[11]  Yves Meyer,et al.  Oscillating Patterns in Image Processing and Nonlinear Evolution Equations: The Fifteenth Dean Jacqueline B. Lewis Memorial Lectures , 2001 .

[12]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[13]  Ron Kohavi,et al.  Online controlled experiments at large scale , 2013, KDD.

[14]  H. Bondell,et al.  Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR , 2008, Biometrics.

[15]  Noah Simon,et al.  Convex Regression with Interpretable Sharp Partitions , 2016, J. Mach. Learn. Res..

[16]  A. Dasgupta Asymptotic Theory of Statistics and Probability , 2008 .

[17]  Jonathan Taylor,et al.  Statistical learning and selective inference , 2015, Proceedings of the National Academy of Sciences.

[18]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[19]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[20]  Alex Deng,et al.  Objective Bayesian Two Sample Hypothesis Testing for Online Controlled Experiments , 2015, WWW.

[21]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[22]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[23]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[24]  Ron Kohavi,et al.  Controlled experiments on the web: survey and practical guide , 2009, Data Mining and Knowledge Discovery.

[25]  James J. Heckman,et al.  Randomization and Social Policy Evaluation , 1991 .

[26]  Jay Bartroff,et al.  Sequential Experimentation in Clinical Trials , 2013 .

[27]  Jay Bartroff,et al.  Sequential Experimentation in Clinical Trials: Design and Analysis , 2012 .

[28]  Robert Tibshirani,et al.  Post-selection adaptive inference for Least Angle Regression and the Lasso , 2014 .

[29]  Ashley Petersen,et al.  Fused Lasso Additive Model , 2014, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[30]  M. Davidian,et al.  Covariate adjustment for two‐sample treatment comparisons in randomized clinical trials: A principled yet flexible approach , 2008, Statistics in medicine.

[31]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[32]  Joseph G. Pigeon,et al.  Statistics for Experimenters: Design, Innovation and Discovery , 2006, Technometrics.

[33]  CARLOS A. GOMEZ-URIBE,et al.  The Netflix Recommender System , 2015, ACM Trans. Manag. Inf. Syst..

[34]  Susan Athey,et al.  The Econometrics of Randomized Experiments , 2016, 1607.00698.

[35]  Michael S. Bernstein,et al.  Designing and deploying online field experiments , 2014, WWW.

[36]  Matt Taddy,et al.  Heterogeneous Treatment Effects in Digital Experimentation , 2014 .

[37]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[38]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[39]  J. Pearl Causal Inference in Statistics: an Introduction , 2022 .

[40]  Stuart Barber,et al.  All of Statistics: a Concise Course in Statistical Inference , 2005 .

[41]  Alexander J. Smola,et al.  Trend Filtering on Graphs , 2014, J. Mach. Learn. Res..

[42]  Marc Ratkovic,et al.  Estimating treatment effect heterogeneity in randomized program evaluation , 2013, 1305.5682.

[43]  C. O’Brien Statistical Learning with Sparsity: The Lasso and Generalizations , 2016 .

[44]  L. Pekelis,et al.  Always Valid Inference: Bringing Sequential Analysis to A/B Testing , 2015, 1512.04922.

[45]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[46]  Lu Tian,et al.  A Simple Method for Detecting Interactions between a Treatment and a Large Number of Covariates , 2012, 1212.2995.

[47]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .