暂无分享,去创建一个
[1] Olivier Teytaud,et al. Online Sparse Bandit for Card Games , 2011, ACG.
[2] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[3] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[4] René Vidal,et al. Global Optimality in Tensor Factorization, Deep Learning, and Beyond , 2015, ArXiv.
[5] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[6] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .
[7] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[8] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[9] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.
[10] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[11] Danica Kragic,et al. Multi-armed bandit models for 2D grasp planning with uncertainty , 2015, 2015 IEEE International Conference on Automation Science and Engineering (CASE).
[12] Razvan Pascanu,et al. Metacontrol for Adaptive Imagination-Based Optimization , 2017, ICLR.
[13] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[14] M. Rothschild. A two-armed bandit theory of market pricing , 1974 .
[15] O. Kallenberg. Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.
[16] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[17] Mingyan Liu,et al. Online Learning of Rested and Restless Bandits , 2011, IEEE Transactions on Information Theory.
[18] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[19] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[20] Martial Hebert,et al. Multi-armed recommendation bandits for selecting state machine policies for robotic systems , 2013, 2013 IEEE International Conference on Robotics and Automation.
[21] Peter Auer,et al. Regret bounds for restless Markov bandits , 2012, Theor. Comput. Sci..
[22] R. Simon,et al. Optimal two-stage designs for phase II clinical trials. , 1989, Controlled clinical trials.
[23] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[24] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.