Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures. We analyze a broad class of two-stage procedures that first estimate the treatment effect function, and then use this quantity to estimate the linear functional. We prove non-asymptotic upper bounds on the mean-squared error of such procedures: these bounds reveal that in order to obtain non-asymptotically optimal procedures, the error in estimating the treatment effect should be minimized in a certain weighted L-norm. We analyze a two-stage procedure based on constrained regression in this weighted norm, and establish its instance-dependent optimality in finite samples via matching non-asymptotic local minimax lower bounds. These results show that the optimal non-asymptotic risk, in addition to depending on the asymptotically efficient variance, depends on the weighted norm distance between the true outcome function and its approximation by the richest function class supported by the sample size.

[1]  Minimax rates for heterogeneous causal effect estimation , 2022, 2203.00837.

[2]  Martin J. Wainwright,et al.  Minimax Off-Policy Evaluation for Multi-Armed Bandits , 2021, IEEE Transactions on Information Theory.

[3]  Masatoshi Uehara,et al.  Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning , 2019, Oper. Res..

[4]  Zhengyuan Zhou,et al.  Offline Multi-Action Policy Learning: Generalization and Optimization , 2018, Oper. Res..

[5]  Martin J. Wainwright,et al.  Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning , 2021, NeurIPS.

[6]  Martin J. Wainwright,et al.  Near-optimal inference in adaptive linear regression , 2021, ArXiv.

[7]  Susan Athey,et al.  Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits , 2021, KDD.

[8]  Policy Learning with Adaptively Collected Data , 2021, ArXiv.

[9]  Timothy B. Armstrong,et al.  Finite-Sample Optimal Estimation and Inference on Average Treatment Effects Under Unconfoundedness , 2017, Econometrica.

[10]  Stefan Wager,et al.  Augmented minimax linear estimation , 2017, The Annals of Statistics.

[11]  Rajen Dinesh Shah,et al.  Debiased Inverse Propensity Score Weighting for Estimation of Average Treatment Effects with High-Dimensional Confounders. , 2020, 2011.08661.

[12]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[13]  Alessandro Rinaldo,et al.  On Conditional Versus Marginal Bias in Multi-Armed Bandits , 2020, ICML.

[14]  Yu-Xiang Wang,et al.  Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning , 2020, AISTATS.

[15]  W. Newey,et al.  Minimax Semiparametric Learning With Approximate Sparsity , 2019, 1912.12213.

[16]  Stefan Wager,et al.  Sparsity Double Robust Inference of Average Treatment Effects , 2019, 1905.00744.

[17]  Vasilis Syrgkanis,et al.  Orthogonal Statistical Learning , 2019, The Annals of Statistics.

[18]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[19]  Ilias Zadik,et al.  Orthogonal Machine Learning: Power and Limitations , 2017, ICML.

[20]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[21]  Stefan Wager,et al.  Policy Learning With Observational Data , 2017, Econometrica.

[22]  Miroslav Dudík,et al.  Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.

[23]  Nan Jiang,et al.  Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.

[24]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[25]  Lihong Li,et al.  Toward Minimax Off-policy Value Estimation , 2015, AISTATS.

[26]  Shahar Mendelson,et al.  Learning without Concentration , 2014, COLT.

[27]  V. Koltchinskii,et al.  Bounding the smallest singular value of a random matrix without concentration , 2013, 1312.3580.

[28]  Sjoerd Dirksen,et al.  Tail bounds via generic chaining , 2013, ArXiv.

[29]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[30]  Nathan Ross Fundamentals of Stein's method , 2011, 1109.1880.

[31]  G. Imbens,et al.  Matching on the Estimated Propensity Score , 2009 .

[32]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[33]  A. W. van der Vaart,et al.  Semiparametric Minimax Rates. , 2009, Electronic journal of statistics.

[34]  Y. Nishiyama,et al.  A PUZZLING PHENOMENON IN SEMIPARAMETRIC ESTIMATION PROBLEMS WITH INFINITE-DIMENSIONAL NUISANCE PARAMETERS , 2008, Econometric Theory.

[35]  Han Hong,et al.  Semiparametric Efficiency in GMM Models of Nonclassical Measurement Errors, Missing Data and Treatment Effects , 2008 .

[36]  I. Castillo Semi-parametric second-order efficient estimation of the period of a signal , 2007, 0711.3955.

[37]  R. Adamczak A tail inequality for suprema of unbounded empirical processes with applications to Markov chains , 2007, 0709.3110.

[38]  A. Tsybakov,et al.  PENALIZED MAXIMUM LIKELIHOOD AND SEMIPARAMETRIC SECOND-ORDER EFFICIENCY , 2006, math/0605437.

[39]  M. Talagrand The Generic chaining : upper and lower bounds of stochastic processes , 2005 .

[40]  Shahar Mendelson,et al.  Entropy, Combinatorial Dimensions and Random Averages , 2002, COLT.

[41]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[42]  J. Hahn On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .

[43]  J. Robins,et al.  Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. , 1997, Statistics in medicine.

[44]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[45]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[46]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[47]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[48]  P. Bickel Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[49]  Donald B. Rubin,et al.  Characterizing the effect of matching using linear propensity score methods with normal distributions , 1992 .

[50]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[51]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[52]  O. Ashenfelter,et al.  Estimating the Effect of Training Programs on Earnings , 1976 .

[53]  J. Hájek Local asymptotic minimax and admissibility in estimation , 1972 .

[54]  A. Kolmogorov,et al.  Entropy and "-capacity of sets in func-tional spaces , 1961 .

[55]  Le Cam,et al.  Locally asymptotically normal families of distributions : certain approximations to families of distributions & thier use in the theory of estimation & testing hypotheses , 1960 .

[56]  C. Stein Efficient Nonparametric Testing and Estimation , 1956 .