Expert Selection in High-Dimensional Markov Decision Processes
暂无分享,去创建一个
S. Shankar Sastry | Claire J. Tomlin | Roy Dong | Vicenç Rúbies Royo | Eric Mazumdar | Eric V. Mazumdar | S. Sastry | C. Tomlin | Roy Dong
[1] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[2] S. Shankar Sastry,et al. A Multi-Armed Bandit Approach for Online Expert Selection in Markov Decision Processes , 2017, ArXiv.
[3] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[4] Danica Kragic,et al. Multi-armed bandit models for 2D grasp planning with uncertainty , 2015, 2015 IEEE International Conference on Automation Science and Engineering (CASE).
[5] Mingyan Liu,et al. Online Learning of Rested and Restless Bandits , 2011, IEEE Transactions on Information Theory.
[6] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[7] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[8] Yu-Jin Zhang,et al. A Highly Effective Impulse Noise Detection Algorithm for Switching Median Filters , 2010, IEEE Signal Processing Letters.
[9] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[10] Razvan Pascanu,et al. Metacontrol for Adaptive Imagination-Based Optimization , 2017, ICLR.
[11] Lina J. Karam,et al. Understanding how image quality affects deep neural networks , 2016, 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[14] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[15] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[16] Yishay Mansour,et al. Experts in a Markov Decision Process , 2004, NIPS.
[17] Firas Ajil Jassim,et al. Image Denoising Using Interquartile Range Filter with Local Averaging , 2013, ArXiv.
[18] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[19] Peter Auer,et al. Regret bounds for restless Markov bandits , 2012, Theor. Comput. Sci..
[20] Sangram Ganguly,et al. Learning Sparse Feature Representations Using Probabilistic Quadtrees and Deep Belief Nets , 2015, Neural Processing Letters.
[21] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.