暂无分享,去创建一个
[1] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[2] H. Robbins. A Stochastic Approximation Method , 1951 .
[3] Sergey Levine,et al. MuProp: Unbiased Backpropagation for Stochastic Neural Networks , 2015, ICLR.
[4] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[5] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[6] Benjamin Recht,et al. Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.
[7] Pieter Abbeel,et al. Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.
[8] Richard E. Turner,et al. Structured Evolution with Compact Architectures for Scalable Policy Optimization , 2018, ICML.
[9] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[10] David Duvenaud,et al. Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.
[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[12] Dengyong Zhou,et al. Action-depedent Control Variates for Policy Optimization via Stein's Identity , 2017 .
[13] Richard E. Turner,et al. Geometrically Coupled Monte Carlo Sampling , 2018, NeurIPS.
[14] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[15] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.
[16] Alexandre M. Bayen,et al. Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines , 2018, ICLR.
[17] Sergey Levine,et al. The Mirage of Action-Dependent Baselines in Reinforcement Learning , 2018, ICML.
[18] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.
[19] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[20] Sean Gerrish,et al. Black Box Variational Inference , 2013, AISTATS.
[21] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[22] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[23] Jascha Sohl-Dickstein,et al. REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.
[24] Michael I. Jordan,et al. Variational Bayesian Inference with Stochastic Search , 2012, ICML.
[25] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[26] Kenta Oono,et al. Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .
[27] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[28] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[29] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.
[30] Martial Hebert,et al. Smoothing-based Optimization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.