Open Loop Optimistic Planning
暂无分享,去创建一个
[1] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT 2010.
[2] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[3] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.
[4] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[5] Rémi Munos,et al. Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.
[6] Rémi Munos,et al. Optimistic Planning of Deterministic Systems , 2008, EWRL.
[7] Nan Rong,et al. What makes some POMDP problems easy to approximate? , 2007, NIPS.
[8] Peter Auer,et al. Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.
[9] Cordelia Schmid,et al. Bandit Algorithms for Tree Search , 2007, UAI.
[10] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[11] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[12] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[13] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[14] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.
[15] Robert D. Kleinberg,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .
[16] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .
[17] Irini Angelidaki,et al. Anaerobic digestion model No. 1 (ADM1) , 2002 .
[18] Ying He,et al. Simulation-Based Algorithms for Markov Decision Processes , 2002 .
[19] Y. Freund,et al. The non-stochastic multi-armed bandit problem , 2001 .
[20] N. Fisher,et al. Probability Inequalities for Sums of Bounded Random Variables , 1994 .
[21] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .