Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
暂无分享,去创建一个
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] M. A. Girshick,et al. Bayes and minimax solutions of sequential decision problems , 1949 .
[3] Philip Wolfe,et al. Contributions to the theory of games , 1953 .
[4] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .
[5] J. Kiefer,et al. Sequential minimax search for a maximum , 1953 .
[6] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .
[7] A. Banos. On Pseudo-Games , 1968 .
[8] J. Andel. Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.
[9] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[10] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[11] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[12] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[13] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[14] Robert W. Chen,et al. Bandit problems with infinitely many arms , 1997 .
[15] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .
[16] Dale Schuurmans,et al. General Convergence Results for Linear Discriminant Updates , 1997, COLT '97.
[17] David P. Helmbold,et al. Some label efficient learning results , 1997, COLT '97.
[18] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[19] K. Ball. An Elementary Introduction to Modern Convex Geometry , 1997 .
[20] P. Lezaud. Chernoff-type bound for finite Markov chains , 1998 .
[21] Philip M. Long,et al. Associative Reinforcement Learning using Linear Probabilistic Concepts , 1999, ICML.
[22] Andreu Mas-Colell,et al. A General Class of Adaptive Strategies , 1999, J. Econ. Theory.
[23] Y. Freund,et al. The non-stochastic multi-armed bandit problem , 2001 .
[24] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[25] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.
[26] Manfred K. Warmuth,et al. Path Kernels and Multiplicative Updates , 2002, J. Mach. Learn. Res..
[27] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[28] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[29] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[30] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[31] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[32] Manfred K. Warmuth,et al. Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.
[33] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[34] Mark Herbster,et al. Tracking the Best Expert , 1995, Machine Learning.
[35] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.
[36] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[37] Avrim Blum,et al. Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.
[38] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.
[39] J. Hiriart-Urruty,et al. Fundamentals of Convex Analysis , 2004 .
[40] O. Bousquet. THEORY OF CLASSIFICATION: A SURVEY OF RECENT ADVANCES , 2004 .
[41] Eric Deeson,et al. Online learning , 2005, Br. J. Educ. Technol..
[42] H. Vincent Poor,et al. Bandit problems with side observations , 2005, IEEE Transactions on Automatic Control.
[43] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.
[44] Alexander V. Nazin,et al. Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging , 2005, Probl. Inf. Transm..
[45] Gábor Lugosi,et al. Minimizing regret with label efficient prediction , 2004, IEEE Transactions on Information Theory.
[46] Sanjeev R. Kulkarni,et al. Arbitrary side observations in bandit problems , 2005, Adv. Appl. Math..
[47] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[48] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .
[49] Gilles Stoltz. Incomplete information and internal regret in prediction of individual sequences , 2005 .
[50] Peter Auer,et al. Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring , 2006, ALT.
[51] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[52] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .
[53] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[54] Yishay Mansour,et al. Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.
[55] Nimrod Megiddo,et al. Combining expert advice in reactive environments , 2006, JACM.
[56] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[57] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[58] Rémi Munos,et al. Bandit Algorithms for Tree Search , 2007, UAI.
[59] H. Robbins. A Stochastic Approximation Method , 1951 .
[60] András György,et al. Continuous Time Associative Bandit Problems , 2007, IJCAI.
[61] Manfred K. Warmuth,et al. Learning Permutations with Exponential Weights , 2007, COLT.
[62] Tamás Linder,et al. The On-Line Shortest Path Problem Under Partial Monitoring , 2007, J. Mach. Learn. Res..
[63] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[64] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[65] Peter Auer,et al. Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.
[66] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.
[67] Shai Shalev-Shwartz,et al. Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .
[68] Ambuj Tewari,et al. Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.
[69] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[70] Demosthenis Teneketzis,et al. Multi-Armed Bandit Problems , 2008 .
[71] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2008, Math. Oper. Res..
[72] Eli Upfal,et al. Adapting to a Changing Environment: the Brownian Restless Bandits , 2008, COLT.
[73] Varun Grover,et al. Active Learning in Multi-armed Bandits , 2008, ALT.
[74] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[75] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .
[76] Manfred K. Warmuth,et al. Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2008 .
[77] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.
[78] Rémi Munos,et al. Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.
[79] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.
[80] Thomas P. Hayes,et al. High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.
[81] Olivier Teytaud,et al. Creating an Upper-Confidence-Tree Program for Havannah , 2009, ACG.
[82] Thorsten Joachims,et al. The K-armed Dueling Bandits Problem , 2012, COLT.
[83] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[84] Matthew J. Streeter,et al. Tighter Bounds for Multi-Armed Bandits with Expert Advice , 2009, COLT.
[85] Moshe Babaioff,et al. Characterizing truthful multi-armed bandit mechanisms: extended abstract , 2009, EC '09.
[86] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[87] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.
[88] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[89] Jacob D. Abernethy,et al. Beating the adaptive bandit with high probability , 2009, 2009 Information Theory and Applications Workshop.
[90] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[91] Varun Kanade,et al. Sleeping Experts and Bandits with Stochastic Action Availability and Adversarial Rewards , 2009, AISTATS.
[92] Nikhil R. Devanur,et al. The price of truthfulness for pay-per-click auctions , 2009, EC '09.
[93] Eric W. Cope,et al. Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.
[94] Moshe Babaioff,et al. Truthful mechanisms with implicit payment computation , 2010, EC '10.
[95] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[96] Akimichi Takemura,et al. An Asymptotically Optimal Bandit Algorithm for Bounded Support Models. , 2010, COLT 2010.
[97] B. Kégl,et al. Fast boosting using adversarial bandits , 2010, ICML.
[98] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[99] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[100] Rémi Munos,et al. Open Loop Optimistic Planning , 2010, COLT.
[101] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..
[102] Lin Xiao,et al. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.
[103] Olivier Teytaud,et al. Bandit-Based Genetic Programming , 2010, EuroGP.
[104] Peter L. Bartlett,et al. Optimal Allocation Strategies for the Dark Pool Problem , 2010, AISTATS.
[105] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..
[106] John Shawe-Taylor,et al. Regret Bounds for Gaussian Process Bandit Problems , 2010, AISTATS 2010.
[107] Bhaskar Krishnamachari,et al. Dynamic Multichannel Access With Imperfect Channel State Detection , 2010, IEEE Transactions on Signal Processing.
[108] Robert E. Schapire,et al. Non-Stochastic Bandit Slate Problems , 2010, NIPS.
[109] John L. Nazareth,et al. Introduction to derivative-free optimization , 2010, Math. Comput..
[110] Sébastien Bubeck. Bandits Games and Clustering Foundations , 2010 .
[111] Dominik D. Freydenberger,et al. Can We Learn to Gamble Efficiently? , 2010, COLT.
[112] Csaba Szepesvári,et al. Toward a classification of finite partial-monitoring games , 2010, Theor. Comput. Sci..
[113] Atsuyoshi Nakamura,et al. Algorithms for Adversarial Bandit Problems with Multiple Plays , 2010, ALT.
[114] Robert D. Kleinberg,et al. Regret bounds for sleeping experts and bandits , 2010, Machine Learning.
[115] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[116] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[117] Wouter M. Koolen,et al. Hedging Structured Concepts , 2010, COLT.
[118] John Shawe-Taylor,et al. PAC-Bayesian Analysis of Contextual Bandits , 2011, NIPS.
[119] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.
[120] Aurélien Garivier,et al. Optimally Sensing a Single Channel Without Prior Information: The Tiling Algorithm and Regret Bounds , 2011, IEEE Journal of Selected Topics in Signal Processing.
[121] Sham M. Kakade,et al. Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..
[122] Eric Moulines,et al. On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.
[123] Rémi Munos,et al. Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..
[124] Thorsten Joachims,et al. Beat the Mean Bandit , 2011, ICML.
[125] Gábor Lugosi,et al. Minimax Policies for Combinatorial Prediction Games , 2011, COLT.
[126] Elad Hazan,et al. Better Algorithms for Benign Bandits , 2009, J. Mach. Learn. Res..
[127] Csaba Szepesvári,et al. –armed Bandits , 2022 .
[128] Rémi Munos,et al. Adaptive Bandits: Towards the best history-dependent strategy , 2011, AISTATS.
[129] Sébastien Bubeck,et al. Introduction to Online Optimization , 2011 .
[130] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[131] Rémi Munos,et al. Finite Time Analysis of Stratified Sampling for Monte Carlo , 2011, NIPS.
[132] Vianney Perchet,et al. The multi-armed bandit problem with covariates , 2011, ArXiv.
[133] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[134] Shie Mannor,et al. From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.
[135] Elad Hazan,et al. Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction , 2011, NIPS.
[136] Alessandro Lazaric,et al. Multi-Bandit Best Arm Identification , 2011, NIPS.
[137] Umar Syed,et al. Bandits, Query Learning, and the Haystack Dimension , 2011, COLT.
[138] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[139] Shie Mannor,et al. Unimodal Bandits , 2011, ICML.
[140] Rémi Munos,et al. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences , 2011, COLT.
[141] Aleksandrs Slivkins,et al. Contextual Bandits with Similarity Information , 2009, COLT.
[142] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[143] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[144] Jean-Yves Audibert,et al. Deviations of Stochastic Bandit Regret , 2011, ALT.
[145] Alessandro Lazaric,et al. Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits , 2011, ALT.
[146] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[147] Elad Hazan. The convex optimization approach to regret minimization , 2011 .
[148] Shie Mannor,et al. Committing Bandits , 2011, NIPS.
[149] Peter L. Bartlett,et al. Oracle inequalities for computationally budgeted model selection , 2011, COLT.
[150] Ambuj Tewari,et al. On the Universality of Online Mirror Descent , 2011, NIPS.
[151] Koby Crammer,et al. Multiclass classification with bandit feedback using adaptive regularization , 2012, Machine Learning.
[152] Mingyan Liu,et al. Online Learning of Rested and Restless Bandits , 2011, IEEE Transactions on Information Theory.
[153] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[154] Sham M. Kakade,et al. Towards Minimax Policies for Online Linear Optimization with Bandit Feedback , 2012, COLT.
[155] Aleksandrs Slivkins,et al. 25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .
[156] Ambuj Tewari,et al. Regularization Techniques for Learning with Matrices , 2009, J. Mach. Learn. Res..
[157] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[158] Dean P. Foster,et al. No Internal Regret via Neighborhood Watch , 2011, AISTATS.
[159] Damien Ernst,et al. Optimal discovery with probabilistic expert advice , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[160] Thomas Steinke,et al. Learning hurdles for sleeping experts , 2012, ITCS '12.
[161] Ambuj Tewari,et al. Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.
[162] Sanjeev Arora,et al. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..
[163] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[164] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.
[165] Nicolò Cesa-Bianchi,et al. Bandits With Heavy Tail , 2012, IEEE Transactions on Information Theory.
[166] Sébastien Bubeck,et al. Multiple Identifications in Multi-Armed Bandits , 2012, ICML.
[167] Peter Auer,et al. Regret bounds for restless Markov bandits , 2012, Theor. Comput. Sci..
[168] Wouter M. Koolen,et al. Combining initial segments of lists , 2011, Theor. Comput. Sci..
[169] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.
[170] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .