I NTRODUCING C OORDINATION IN C ONCURRENT R EIN FORCEMENT L EARNING
暂无分享,去创建一个
[1] Amin Karbasi,et al. Parallelizing Thompson Sampling , 2021, NeurIPS.
[2] S. Levine,et al. MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale , 2021, ArXiv.
[3] Simina Brânzei,et al. Multiplayer Bandit Learning, from Competition to Cooperation , 2019, COLT.
[4] Sergey Levine,et al. One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL , 2020, NeurIPS.
[5] Youngchul Sung,et al. Population-Guided Parallel Policy Search for Reinforcement Learning , 2020, ICLR.
[6] Alessandro Lazaric,et al. Frequentist Regret Bounds for Randomized Least-Squares Value Iteration , 2019, AISTATS.
[7] Karol Hausman,et al. Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping , 2019, Robotics: Science and Systems.
[8] Daniel Russo,et al. Worst-Case Regret Bounds for Exploration via Randomized Value Functions , 2019, NeurIPS.
[9] Finale Doshi-Velez,et al. Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies , 2019, IJCAI.
[10] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[11] Joelle Pineau,et al. Randomized Value Functions via Multiplicative Normalizing Flows , 2018, UAI.
[12] Qiang Liu,et al. Learning Self-Imitating Diverse Policies , 2018, ICLR.
[13] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[14] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[15] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[16] Benjamin Van Roy,et al. Scalable Coordinated Exploration in Concurrent Reinforcement Learning , 2018, NeurIPS.
[17] Kirthevasan Kandasamy,et al. Parallelised Bayesian Optimisation via Thompson Sampling , 2018, AISTATS.
[18] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[19] Benjamin Van Roy,et al. Coordinated Exploration in Concurrent Reinforcement Learning , 2018, ICML.
[20] Kenneth O. Stanley,et al. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.
[21] Lilian Besson,et al. {Multi-Player Bandits Revisited} , 2017, ALT.
[22] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[23] Yang Liu,et al. Stein Variational Policy Gradient , 2017, UAI.
[24] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[25] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[26] Jason Pazis,et al. Efficient PAC-Optimal Exploration in Concurrent, Continuous State MDPs with Delayed Updates , 2016, AAAI.
[27] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[28] Vianney Perchet,et al. Batched Bandit Problems , 2015, COLT.
[29] David Silver,et al. Concurrent Reinforcement Learning from Customer Interactions , 2013, ICML.
[30] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[31] Nicolas Vayatis,et al. Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration , 2013, ECML/PKDD.
[32] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[33] Hai Jiang,et al. Medium access in cognitive radio networks: A competitive multi-armed bandit framework , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.
[34] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[35] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .