From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning
暂无分享,去创建一个
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] W. R. Thompson. On the Theory of Apportionment , 1935 .
[3] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .
[4] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[5] Nils J. Nilsson,et al. Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[6] Lamberto Cesari,et al. Optimization-Theory And Applications , 1983 .
[7] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[8] Henryk Wozniakowski,et al. Information-based complexity , 1987, Nature.
[9] H. Woxniakowski. Information-Based Complexity , 1988 .
[10] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[11] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[12] A. Neumaier. Interval methods for systems of equations , 1990 .
[13] Bruce Abramson,et al. Expected-Outcome: A General Model of Static Evaluation , 1990, IEEE Trans. Pattern Anal. Mach. Intell..
[14] Eldon Hansen,et al. Global optimization using interval analysis , 1992, Pure and applied mathematics.
[15] J. Banks,et al. Denumerable-Armed Bandits , 1992 .
[16] R. Horst,et al. Global Optimization: Deterministic Approaches , 1992 .
[17] Bernd Brügmann Max-Planck. Monte Carlo Go , 1993 .
[18] C. D. Perttunen,et al. Lipschitzian optimization without the Lipschitz constant , 1993 .
[19] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[20] R. Agrawal. The Continuum-Armed Bandit Problem , 1995 .
[21] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[22] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .
[23] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[24] R. B. Kearfott. Rigorous Global Search: Continuous Problems , 1996 .
[25] Robert W. Chen,et al. Bandit problems with infinitely many arms , 1997 .
[26] J D Pinter,et al. Global Optimization in Action—Continuous and Lipschitz Optimization: Algorithms, Implementations and Applications , 2010 .
[27] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[28] Alex M. Andrew,et al. Reinforcement Learning: : An Introduction , 1998 .
[29] Aleksandra Eric,et al. A Heuristic Search Algorithm for Markov Decision Problems , 1999 .
[30] Y. D. Sergeyev,et al. Global Optimization with Non-Convex Constraints - Sequential and Parallel Algorithms (Nonconvex Optimization and its Applications Volume 45) (Nonconvex Optimization and Its Applications) , 2000 .
[31] C. T. Kelley,et al. Modifications of the direct algorithm , 2001 .
[32] Bruno Bouzy,et al. Computer Go: An AI oriented survey , 2001, Artif. Intell..
[33] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[34] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[35] Jan M. Maciejowski,et al. Predictive control : with constraints , 2002 .
[36] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[37] Marko Bacic,et al. Model predictive control , 2003 .
[38] Bruno Bouzy,et al. Monte-Carlo Go Developments , 2003, ACG.
[39] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[40] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[41] Frédérick Garcia,et al. On-Line Search for Solving Markov Decision Processes via Heuristic Sampling , 2004, ECAI.
[42] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[43] D. Finkel,et al. Convergence analysis of the direct algorithm , 2004 .
[44] Frederick Garcia. On-line search for solving large Markov de-cision processes , 2004 .
[45] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[46] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.
[47] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.
[48] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[49] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[50] Steven M. LaValle,et al. Planning algorithms , 2006 .
[51] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .
[52] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[53] Joelle Pineau,et al. Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..
[54] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[55] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[56] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.
[57] Rémi Munos,et al. Bandit Algorithms for Tree Search , 2007, UAI.
[58] Tapio Elomaa,et al. Following the Perturbed Leader to Gamble at Multi-armed Bandits , 2007, ALT.
[59] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[60] Peter Auer,et al. Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.
[61] Sylvain Gelly,et al. Modifications of UCT and sequence-like simulations for Monte-Carlo Go , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.
[62] Joelle Pineau,et al. Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..
[63] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[64] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .
[65] Louis Wehenkel,et al. Lazy Planning under Uncertainty by Optimizing Decisions on an Ensemble of Incomplete Disturbance Trees , 2008, EWRL.
[66] Rémi Munos,et al. Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.
[67] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.
[68] Rémi Munos,et al. Optimistic Planning of Deterministic Systems , 2008, EWRL.
[69] Tzung-Pei Hong,et al. The Computational Intelligence of MoGo Revealed in Taiwan's Computer Go Tournaments , 2009, IEEE Transactions on Computational Intelligence and AI in Games.
[70] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[71] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[72] David Silver,et al. Reinforcement learning and simulation-based search in computer go , 2009 .
[73] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[74] Akimichi Takemura,et al. An Asymptotically Optimal Bandit Algorithm for Bounded Support Models. , 2010, COLT 2010.
[75] Thomas Hérault,et al. Scalability and Parallelization of Monte-Carlo Tree Search , 2010, Computers and Games.
[76] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[77] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[78] Rémi Munos,et al. Open Loop Optimistic Planning , 2010, COLT.
[79] Sébastien Bubeck. Bandits Games and Clustering Foundations , 2010 .
[80] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[81] Olivier Teytaud,et al. Biasing Monte-Carlo Simulations through RAVE Values , 2010, Computers and Games.
[82] Guillaume Maurice Jean-Bernard Chaslot Chaslot,et al. Monte-Carlo Tree Search , 2010 .
[83] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[84] Olivier Buffet,et al. Markov Decision Processes in Artificial Intelligence , 2010 .
[85] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.
[86] Odalric-Ambrym Maillard,et al. (APPRENTISSAGE SÉQUENTIEL : Bandits, Statistique et Renforcement , 2011 .
[87] Sham M. Kakade,et al. Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..
[88] Aleksandrs Slivkins,et al. Multi-armed bandits on implicit metric spaces , 2011, NIPS.
[89] Michael L. Littman,et al. Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search , 2011, UAI.
[90] Robert Babuÿ,et al. OPTIMISTIC PLANNING IN MARKOV DECISION PROCESSES , 2011 .
[91] Csaba Szepesvári,et al. –armed Bandits , 2022 .
[92] Rémi Munos,et al. Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.
[93] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[94] Anne Auger,et al. Theory of Randomized Search Heuristics: Foundations and Recent Developments , 2011, Theory of Randomized Search Heuristics.
[95] Rémi Munos,et al. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences , 2011, COLT.
[96] Aleksandrs Slivkins,et al. Contextual Bandits with Similarity Information , 2009, COLT.
[97] Jia Yuan Yu,et al. Lipschitz Bandits without the Lipschitz Constant , 2011, ALT.
[98] David Silver,et al. Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..
[99] Akimichi Takemura,et al. An asymptotically optimal policy for finite support models in the multiarmed bandit problem , 2009, Machine Learning.
[100] U. Rieder,et al. Markov Decision Processes with Applications to Finance , 2011 .
[101] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[102] Bart De Schutter,et al. Optimistic planning for sparsely stochastic systems , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[103] Joelle Pineau,et al. A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes , 2011, J. Mach. Learn. Res..
[104] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[105] Rémi Munos,et al. Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit , 2012, AISTATS.
[106] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[107] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[108] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[109] Peter Vrancx,et al. Reinforcement Learning: State-of-the-Art , 2012 .
[110] Csaba Szepesvári,et al. Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.
[111] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[112] Lucian Busoniu,et al. Optimistic planning for Markov decision processes , 2012, AISTATS.
[113] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[114] Martin J. Wainwright,et al. Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.
[115] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.
[116] Rémi Munos,et al. Stochastic Simultaneous Optimistic Optimization , 2013, ICML.
[117] Lucian Busoniu,et al. Optimistic planning for belief-augmented Markov Decision Processes , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[118] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[119] Steven I. Marcus,et al. Simulation-based Algorithms for Markov Decision Processes/ Hyeong Soo Chang ... [et al.] , 2013 .
[120] Rémi Munos,et al. Thompson Sampling for 1-Dimensional Exponential Family Bandits , 2013, NIPS.
[121] David Q. Mayne,et al. Model predictive control: Recent developments and future promise , 2014, Autom..
[122] Adam D. Bull,et al. Adaptive-treed bandits , 2013, 1302.2489.
[123] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .