论文信息 - Pure exploration in finitely-armed and continuous-armed bandits - 字舞流文

Pure exploration in finitely-armed and continuous-armed bandits

Rémi Munos | Sébastien Bubeck | Gilles Stoltz | R. Munos | Gilles Stoltz | Sébastien Bubeck

[1] Sonja Kuhnt,et al. Design and analysis of computer experiments , 2010 .

[2] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[3] Aleksandrs Slivkins,et al. Sharp dichotomies for regret minimization in metric spaces , 2009, SODA '10.

[4] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[5] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.

[6] A. Tsybakov,et al. Gap-free Bounds for Stochastic Multi-Armed Bandit , 2008 .

[7] Rémi Munos,et al. Bandit Algorithms for Tree Search , 2007, UAI.

[8] András György,et al. Continuous Time Associative Bandit Problems , 2007, IJCAI.

[9] Peter Auer,et al. Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring , 2006, ALT.

[10] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[11] K. Schlag. ELEVEN - Tests needed for a Recommendation , 2006 .

[12] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[13] Gilles Stoltz. Incomplete information and internal regret in prediction of individual sequences , 2005 .

[14] Gábor Lugosi,et al. Minimizing regret with label efficient prediction , 2004, IEEE Transactions on Information Theory.

[15] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[16] Russell Greiner,et al. The Budgeted Multi-armed Bandit Problem , 2004, COLT.

[17] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[18] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[19] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[20] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[21] Luc Devroye,et al. Combinatorial methods in density estimation , 2001, Springer series in statistics.

[22] Y. Freund,et al. The non-stochastic multi-armed bandit problem , 2001 .

[23] Chun-Hung Chen,et al. Simulation Budget Allocation for Further Enhancing the Efficiency of Ordinal Optimization , 2000, Discret. Event Dyn. Syst..

[24] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[25] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .

[26] P. Billingsley,et al. Convergence of Probability Measures , 1970, The Mathematical Gazette.

[27] W. Hoeffding. Probability inequalities for sum of bounded random variables , 1963 .

[28] H. Robbins. Some aspects of the sequential design of experiments , 1952 .