暂无分享,去创建一个
[1] P. Bickel. Efficient and Adaptive Estimation for Semiparametric Models , 1993 .
[2] C. Geyer. Estimating Normalizing Constants and Reweighting Mixtures , 1994 .
[3] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[4] Jeffrey M. Wooldridge,et al. ASYMPTOTIC PROPERTIES OF WEIGHTED M-ESTIMATORS FOR STANDARD STRATIFIED SAMPLES , 2001, Econometric Theory.
[5] S. Murphy,et al. Optimal dynamic treatment regimes , 2003 .
[6] P. McCullagh,et al. A theory of statistical models for Monte Carlo integration , 2003 .
[7] Zhiqiang Tan,et al. On a Likelihood Approach for Monte Carlo Integration , 2004 .
[8] A. Tsiatis. Semiparametric Theory and Missing Data , 2006 .
[9] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[10] Mark J van der Laan,et al. Empirical Efficiency Maximization: Improved Locally Efficient Covariate Adjustment in Randomized Experiments and Survival Analysis , 2008, The international journal of biostatistics.
[11] M. Davidian,et al. Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data , 2009, Biometrika.
[12] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.
[13] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.
[14] Mark J. van der Laan,et al. Cross-Validated Targeted Minimum-Loss-Based Estimation , 2011 .
[15] Marie Davidian,et al. Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. , 2013, Biometrika.
[16] Sergey Levine,et al. Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.
[17] John Langford,et al. Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.
[18] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[19] Susan Athey,et al. The Econometrics of Randomized Experiments , 2016, 1607.00698.
[20] M. J. van der Laan,et al. STATISTICAL INFERENCE FOR THE MEAN OUTCOME UNDER A POSSIBLY NON-UNIQUE OPTIMAL TREATMENT STRATEGY. , 2016, Annals of statistics.
[21] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[22] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[23] Elias Bareinboim,et al. Causal inference and the data-fusion problem , 2016, Proceedings of the National Academy of Sciences.
[24] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[25] Thorsten Joachims,et al. Effective Evaluation Using Logged Bandit Feedback from Multiple Loggers , 2017, KDD.
[26] John Langford,et al. Off-policy evaluation for slate recommendation , 2016, NIPS.
[27] Nathan Kallus,et al. Recursive Partitioning for Personalization using Observational Data , 2016, ICML.
[28] J. Robins,et al. Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .
[29] Shota Yasui,et al. Efficient Counterfactual Learning from Bandit Feedback , 2018, AAAI.
[30] Masatoshi Uehara,et al. Analysis of Noise Contrastive Estimation from the Perspective of Asymptotic Variance , 2018, ArXiv.
[31] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[32] Nathan Kallus,et al. Balanced Policy Evaluation and Learning , 2017, NeurIPS.
[33] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[34] Li He,et al. Off-policy Learning for Multiple Loggers , 2019, KDD.
[35] Masatoshi Uehara,et al. Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning , 2019, NeurIPS.
[36] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning , 2019 .
[37] Yi Su,et al. Adaptive Estimator Selection for Off-Policy Evaluation , 2020, ICML.
[38] Masatoshi Uehara,et al. Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes , 2019, J. Mach. Learn. Res..
[39] Kelly W. Zhang,et al. Inference for Batched Bandits , 2020, NeurIPS.
[40] Bo Dai,et al. GenDICE: Generalized Offline Estimation of Stationary Values , 2020, ICLR.
[41] Masatoshi Uehara,et al. Statistically Efficient Off-Policy Policy Gradients , 2020, ICML.
[42] Masatoshi Uehara,et al. Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning , 2020, Biometrika.
[43] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[44] J. Honda,et al. Adaptive Experimental Design for Efficient Treatment Effect Estimation: Randomized Allocation via Contextual Bandit Algorithm , 2020, ArXiv.
[45] Shu Yang,et al. Combining Multiple Observational Data Sources to Estimate Causal Effects , 2018, Journal of the American Statistical Association.
[46] Hongyuan Zha,et al. Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies , 2020, ICLR.
[47] Krikamol Muandet,et al. Counterfactual Mean Embeddings , 2018, J. Mach. Learn. Res..
[48] Yu Bai,et al. Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning , 2020, ArXiv.
[49] Stefan Wager,et al. Confidence intervals for policy evaluation in adaptive experiments , 2019, Proceedings of the National Academy of Sciences.
[50] G. Imbens,et al. Efficient estimation and stratified sampling , 1996 .