Importance Sampling Policy Evaluation with an Estimated Behavior Policy
暂无分享,去创建一个
[1] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[2] Shota Yasui,et al. Efficient Counterfactual Learning from Bandit Feedback , 2018, AAAI.
[3] Philip S. Thomas,et al. Importance Sampling for Fair Policy Selection , 2017, UAI.
[4] Peter Stone,et al. Data-Efficient Policy Evaluation Through Behavior Policy Search , 2017, ICML.
[5] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[6] Philip S. Thomas,et al. Safe Reinforcement Learning , 2015 .
[7] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.
[8] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[9] Nan Jiang,et al. Doubly Robust Off-policy Evaluation for Reinforcement Learning , 2015, ArXiv.
[10] B. Delyon,et al. Integral approximation by kernel smoothing , 2014, 1409.0733.
[11] Thorsten Joachims,et al. The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] A. Preliminaries. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016 .
[14] Zoran Popovic,et al. Offline Evaluation of Online Reinforcement Learning Algorithms , 2016, AAAI.
[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[16] N. Chopin,et al. Control functionals for Monte Carlo integration , 2014, 1410.2392.
[17] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .
[18] Tsuyoshi Murata,et al. {m , 1934, ACML.
[19] Shie Mannor,et al. Off-policy Model-based Learning under Unknown Factored Dynamics , 2015, ICML.
[20] Marc G. Bellemare,et al. The Reactor: A Sample-Efficient Actor-Critic Architecture , 2017, ArXiv.
[21] Anthony O'Hagan,et al. Monte Carlo is fundamentally unsound , 1987 .
[22] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[23] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[24] P. Rosenbaum. Model-Based Direct Adjustment , 1987 .
[25] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[26] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2002 .
[27] Dirk P. Kroese,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .
[28] Louis Wehenkel,et al. Model-Free Monte Carlo-like Policy Evaluation , 2010, AISTATS.
[29] Srivatsan Srinivasan,et al. Evaluating Reinforcement Learning Algorithms in Observational Health Settings , 2018, ArXiv.
[30] Carl E. Rasmussen,et al. Bayesian Monte Carlo , 2002, NIPS.
[31] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[32] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[33] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .
[34] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[35] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[36] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[37] S. Eguchi,et al. Importance Sampling Via the Estimated Sampler , 2007 .
[38] J. Andrew Bagnell,et al. Agnostic System Identification for Model-Based Reinforcement Learning , 2012, ICML.
[39] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .
[40] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[41] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[42] Philip S. Thomas. Magical Policy Search : Data Efficient Reinforcement Learning with Guarantees of Global Optimality , 2016 .
[43] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[44] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[45] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[46] Danna Zhou,et al. d. , 1934, Microbial pathogenesis.
[47] Qiang Liu,et al. Black-box Importance Sampling , 2016, AISTATS.