Exploration versus Exploitation in Reinforcement Learning: A Stochastic Control Approach
暂无分享,去创建一个
[1] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[2] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.
[3] Emanuel Todorov,et al. Iterative linearization methods for approximately optimal control and estimation of non-linear stochastic system , 2007, Int. J. Control.
[4] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[5] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[6] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[7] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[8] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[9] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.
[10] Benjamin Recht,et al. A Tour of Reinforcement Learning: The View from Continuous Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..
[11] T. Kurtz,et al. Stationary Solutions and Forward Equations for Controlled and Singular Martingale Problems , 2001 .
[12] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[13] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[14] Charles F. Hockett,et al. A mathematical theory of communication , 1948, MOCO.
[15] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[18] Benjamin Van Roy,et al. Eluder Dimension and the Sample Complexity of Optimistic Exploration , 2013, NIPS.
[19] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..
[20] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[21] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .
[22] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[23] KarouiNicole El,et al. Compactification methods in the control of degenerate diffusions: existence of an optimal control , 1987 .
[24] Xiongzhi Chen. Brownian Motion and Stochastic Calculus , 2008 .
[25] A. Mandelbaum,et al. Multi-armed bandits in discrete and continuous time , 1998 .
[26] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[27] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[28] W. Fleming,et al. On stochastic relaxed control for partially observed diffusions , 1984, Nagoya Mathematical Journal.
[29] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[30] T. Kurtz,et al. Existence of Markov Controls and Characterization of Optimal Markov Controls , 1998 .
[31] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[32] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[33] A. Mandelbaum. CONTINUOUS MULTI-ARMED BANDITS AND MULTIPARAMETER PROCESSES , 1987 .
[34] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[35] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[36] Xun Yu Zhou,et al. On the existence of optimal relaxed controls of stochastic partial differential equations , 1992 .
[37] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[38] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[39] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .