Importance Sampling Techniques for Policy Optimization
暂无分享,去创建一个
Marcello Restelli | Alberto Maria Metelli | Matteo Papini | Nico Montali | Marcello Restelli | M. Papini | Nico Montali
[1] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[2] Linyuan Lu,et al. Old and new concentration inequalities , 2006 .
[3] J. Burbea. The convexity with respect to Gaussian distributions of divergences of order a , 1984 .
[4] Tom Schaul,et al. Conditional Importance Sampling for Off-Policy Learning , 2019, AISTATS.
[5] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[6] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[7] Jun Morimoto,et al. Adaptive Step-size Policy Gradients with Average Reward Metric , 2010, ACML.
[8] H. Sebastian Seung,et al. Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).
[9] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[10] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Isao Ono,et al. Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks , 2010, NIPS.
[13] Sergey Levine,et al. The Mirage of Action-Dependent Baselines in Reinforcement Learning , 2018, ICML.
[14] Jan Peters,et al. Compatible natural gradient policy search , 2019, Machine Learning.
[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[16] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[17] Frank Sehnke,et al. Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.
[18] Yishay Mansour,et al. Learning Bounds for Importance Weighting , 2010, NIPS.
[19] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[20] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[21] Alexander J. Smola,et al. P3O: Policy-on Policy-off Policy Optimization , 2019, UAI.
[22] Shie Mannor,et al. Consistent On-Line Off-Policy Evaluation , 2017, ICML.
[23] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[24] S. Amari,et al. Information geometry of divergence functions , 2010 .
[25] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.
[26] Marcello Restelli,et al. Balancing Learning Speed and Stability in Policy Gradient via Adaptive Exploration , 2020, AISTATS.
[27] Alexandre M. Bayen,et al. Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines , 2018, ICLR.
[28] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[29] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.
[30] J. Schmidhuber,et al. Multi-Dimensional Deep Memory Go-Player for Parameter Exploring Policy Gradients , 2010 .
[31] Fady Alajaji,et al. Rényi divergence measures for commonly used univariate continuous distributions , 2013, Inf. Sci..
[32] Marcello Restelli,et al. Stochastic Variance-Reduced Policy Gradient , 2018, ICML.
[33] F. P. Cantelli. Sui confini della probabilità , 1929 .
[34] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[35] Gang Niu,et al. Analysis and Improvement of Policy Gradient Estimation , 2011, NIPS.
[36] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[37] Luca Martino,et al. Effective sample size for importance sampling based on discrepancy measures , 2016, Signal Process..
[38] Shun-ichi Amari,et al. Differential-geometrical methods in statistics , 1985 .
[39] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.
[40] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[41] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[42] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[43] A. Winsor. Sampling techniques. , 2000, Nursing times.
[44] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[45] J. Hoef. Who Invented the Delta Method , 2012 .
[46] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[47] Shie Mannor,et al. Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs , 2020, AAAI.
[48] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.
[49] B. Delyon,et al. Concentration inequalities for sums , 2015 .
[50] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[51] C. R. Rao,et al. Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .
[52] Alejandro Ribeiro,et al. Hessian Aided Policy Gradient , 2019, ICML.
[53] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[54] Marc G. Bellemare,et al. Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.
[55] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[56] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[57] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[58] R. Rubinstein. The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .
[59] Marcello Restelli,et al. Policy Optimization via Importance Sampling , 2018, NeurIPS.
[60] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[61] E. Ionides. Truncated Importance Sampling , 2008 .
[62] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[63] Marcello Restelli,et al. Optimistic Policy Optimization via Multiple Importance Sampling , 2019, ICML.
[64] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[65] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[66] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[67] G. Crooks. On Measures of Entropy and Information , 2015 .
[68] Quanquan Gu,et al. An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient , 2019, UAI.
[69] Philip S. Thomas,et al. Importance Sampling for Fair Policy Selection , 2017, UAI.
[70] Luis A. Escobar,et al. Statistical Intervals: A Guide for Practitioners , 1991 .
[71] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[72] Emma Brunskill,et al. Off-Policy Policy Gradient with State Distribution Correction , 2019, UAI 2019.
[73] Jun Morimoto,et al. Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration , 2012, Neural Computation.
[74] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.
[75] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.
[76] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[77] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[78] Peter Harremoës,et al. Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.
[79] Quanquan Gu,et al. Sample Efficient Policy Gradient Methods with Recursive Variance Reduction , 2020, ICLR.
[80] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[81] Sham M. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[82] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[83] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[84] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[85] Yao Liu,et al. Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling , 2020, ICML.
[86] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[87] B. Delyon,et al. Concentration Inequalities for Sums and Martingales , 2015 .
[88] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[89] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[90] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[91] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[92] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[93] Shimon Whiteson,et al. Expected Policy Gradients , 2017, AAAI.
[94] Leonidas J. Guibas,et al. Optimally combining sampling techniques for Monte Carlo rendering , 1995, SIGGRAPH.
[95] Philip S. Thomas,et al. Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation , 2017, NIPS.