暂无分享,去创建一个
[1] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[3] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[4] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[5] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[6] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[7] Anima Anandkumar,et al. Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.
[8] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[9] Jason L. Loeppky,et al. A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit , 2015, ArXiv.
[10] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[12] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[13] L. Ingber. Very fast simulated re-annealing , 1989 .
[14] Richard Y. Chen,et al. UCB EXPLORATION VIA Q-ENSEMBLES , 2018 .
[15] Karl J. Friston. The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.
[16] Junichiro Yoshimoto,et al. Control of exploitation-exploration meta-parameter in reinforcement learning , 2002, Neural Networks.
[17] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[18] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[19] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[20] U. Rieder,et al. Markov Decision Processes , 2010 .
[21] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[22] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[23] John Langford,et al. Efficient Exploration in Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.
[24] Xing Wang. On Advances in Deep Learning with Applications in Financial Market Modeling , 2020 .