暂无分享,去创建一个
[1] Shie Mannor,et al. Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty , 2012, ICML.
[2] Shu Yang,et al. SENSITIVITY ANALYSIS FOR UNMEASURED CONFOUNDING IN COARSE STRUCTURAL NESTED MEAN MODELS. , 2018, Statistica Sinica.
[3] John Duchi,et al. Bounds on the conditional and average treatment effect in the presence of unobserved confounders , 2018 .
[4] John N. Tsitsiklis,et al. On the Empirical State-Action Frequencies in Markov Decision Processes Under General Policies , 2005, Math. Oper. Res..
[5] Elias Bareinboim,et al. Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes , 2019, NeurIPS.
[6] Marc G. Bellemare,et al. Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.
[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[8] Marek Petrik,et al. RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning , 2014, NIPS.
[9] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[10] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[11] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[12] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..
[13] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon: Double Reinforcement Learning in Infinite-Horizon Processes , 2019, ArXiv.
[14] Donald K. K. Lee,et al. Interval estimation of population means under unknown but bounded probabilities of sample selection , 2013 .
[15] Fredrik D. Johansson,et al. Guidelines for reinforcement learning in healthcare , 2019, Nature Medicine.
[16] David Sontag,et al. Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models , 2019, ICML.
[17] Xiaojie Mao,et al. Interval Estimation of Individual-Level Causal Effects Under Unobserved Confounding , 2018, AISTATS.
[18] Shie Mannor,et al. Off-Policy Evaluation in Partially Observable Environments , 2020, AAAI.
[19] Stephen P. Boyd,et al. A tutorial on geometric programming , 2007, Optimization and Engineering.
[20] Nathan Kallus,et al. Policy Evaluation with Latent Confounders via Optimal Balance , 2019, NeurIPS.
[21] Eitan Altman,et al. Rate of Convergence of Empirical Measures and Costs in Controlled Markov Chains and Transient Optimality , 1994, Math. Oper. Res..
[22] Shie Mannor,et al. Consistent On-Line Off-Policy Evaluation , 2017, ICML.
[23] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[24] Zhiqiang Tan,et al. A Distributional Approach for Causal Inference Using Propensity Scores , 2006 .
[25] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[26] Arkadi Nemirovski,et al. Robust solutions of uncertain linear programs , 1999, Oper. Res. Lett..
[27] Egon Balas,et al. programming: Properties of the convex hull of feasible points * , 1998 .
[28] S. M. Robinson. Stability Theory for Systems of Inequalities. Part I: Linear Systems , 1975 .
[29] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[30] Bernhard Schölkopf,et al. Deconfounding Reinforcement Learning in Observational Settings , 2018, ArXiv.
[31] Z. Geng,et al. Identifying Causal Effects With Proxy Variables of an Unmeasured Confounder. , 2016, Biometrika.
[32] A. Galichon,et al. Duality in Dynamic Discrete Choice Models , 2015 .
[33] Peter Szolovits,et al. Deep Reinforcement Learning for Sepsis Treatment , 2017, ArXiv.
[34] Barbara E. Engelhardt,et al. A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units , 2017, UAI.
[35] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[36] Marek Petrik,et al. Safe Policy Improvement by Minimizing Robust Baseline Regret , 2016, NIPS.
[37] Nathan Kallus,et al. Confounding-Robust Policy Improvement , 2018, NeurIPS.
[38] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning , 2019 .
[39] Max Welling,et al. Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.
[40] Daniel Kuhn,et al. Robust Markov Decision Processes , 2013, Math. Oper. Res..
[41] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[42] E. Altman,et al. Markov decision problems and state-action frequencies , 1991 .